Data Ingestion

Ingest data
from anywhere

Ingest data from any source into your data lake or data warehouse, with no code required. Extend if needed with custom code.

Trusted by forward-thinking teams

Internet Society
Paxie Games
Karaca
GrowDash
Obilet
Workhy
Buluttan
Lessmore
Spektra
Fomo Games
Rotatelab
Talemonster
Internet Society
Paxie Games
Karaca
GrowDash
Obilet
Workhy
Buluttan
Lessmore
Spektra
Fomo Games
Rotatelab
Talemonster
Internet Society
Paxie Games
Karaca
GrowDash
Obilet
Workhy
Buluttan
Lessmore
Spektra
Fomo Games
Rotatelab
Talemonster
kitUP
Fabrikatör
Joinco
Chimnie
ProphetX
Circle
Hyperlab
Moonstep
Agave Games
Kyoso
Digital Moka
Surpass Games
kitUP
Fabrikatör
Joinco
Chimnie
ProphetX
Circle
Hyperlab
Moonstep
Agave Games
Kyoso
Digital Moka
Surpass Games
kitUP
Fabrikatör
Joinco
Chimnie
ProphetX
Circle
Hyperlab
Moonstep
Agave Games
Kyoso
Digital Moka
Surpass Games

// live pipeline

From YAML to running pipeline

Define the source and destination in a few lines of YAML. Bruin connects, extracts, validates, and loads with live status, timing, and row counts.

bruin run --stage ingestion
raw.users.yaml6 lines
1name: raw.users2type: ingestr3parameters:4  source_connection: stripe5  source_table: 'public.users'6  destination: bigquery
sources
$stripe12.4k
postgres84.1k
mongodb3.2k
kafka9.8k
ingestr
2.4k/s
destinations
bigquery
snowflake
databricks
[⟳]loadbigquery · raw.stripe.charges

// stage.connectors

Built-in connectors, defined with YAML

Bruin is a code-based platform, meaning that everything you do comes from a Git repo, versioned. All of the data ingestions are defined in code, version controlled in your repo.

Multiple platforms
Bruin supports quite a few platforms as built-in connectors. You can ingest data from AWS, Azure, GCP, Snowflake, Notion, and more.
Built on open-source
Bruin's ingestion engine is built on ingestr, an open-source data ingestion tool.
Custom sources & destinations
Bruin supports pure Python executions, enabling you to build your own data ingestion code.
Incremental loading
Bruin supports incremental loading, meaning that you can ingest only the new data, not the entire dataset every time.

// stage.quality

End-to-end quality in raw data

Bruin's built-in data quality capabilities are designed to ensure that the data you ingest is of the highest quality and always matches with your expectations.

[✓]
Built-in quality checks check.01
Bruin supports built-in quality checks, such as not_null, accepted_values, and more, all ready to be used in all assets.
[✓]
Custom quality checks check.02
Bruin allows you to define custom quality checks in SQL, enabling you to define your own quality standards.
[✓]
Templating in quality checks check.03
Bruin supports templating in quality checks, meaning that you can use variables in your checks, and run checks only for incremental periods.
[✓]
Automated alerting check.04
Failing quality checks will automatically send alerts to the configured channels, ensuring that you are always aware of the data quality issues.
raw.users.yaml
name: raw.users
type: ingestr

parameters:
  source_connection: postgresql
  source_table: 'public.users'
  destination: bigquery

columns:

  # Define columns along with their quality checks
  - name: status
    checks:
      - name: not_null
      - name: accepted_values
        values:
          - active
          - inactive
          - deleted

# You can also define custom quality checks in SQL        
custom_checks:
  - name: new user count is greater than 1000
    query: |
      SELECT COUNT(*) > 1000 
      FROM raw.users 
      WHERE status = 'active' 
        AND created_at BETWEEN "{{start_date}}" AND "{{end_date}}"

// stage.connect

100+ Sources and Destinations. Infinite Possibilities.

Built on ingestr, our open-source engine. Connect your entire data stack with a single command.

[SLA]
1-week-guarantee.yaml
open

1-Week Implementation Guarantee

Need a source that's not listed? Share testing credentials and we'll implement it within 7 days.

// pipeline.builder

Compose your own pipeline

$ bruin ingest --from <source> --to <destination>

bruin$ pipeline.compose(?, ?)

[01] select_source

[02] select_destination

All Available Data Pipeline Combinations

// quick.access

Replace your entire stack

Every layer of your data infrastructure. One platform. Zero stitching.

Ingestion
FivetranAirbyteMeltanoKafkaDatabricks
Transformation & Orchestration
dbtAirflowDagsterPrefectMatillion
Quality, Lineage & Governance
Great ExpectationsSodaMonte CarloAtlanDataHub
AI Layer
TableauLooker
Power BIPower BI
ChatGPTClaudeCopilot
Bruin

All of this, in one platform.

Quick preview

See Bruin platform in <30 seconds

bruin-platform.mp4
00:30

Plug and play

Use one layer. Or stack them all.

Use any or every capability and save 10× on cost.

AI Dashboards · Agentic, self-updating

AI Analyst · Ask your data anything

Lineage · Column-level, automatic

Quality · Checks on every run

SQL & Python · Transformation + orchestration

Ingestion · 200+ connectors

6/6Full platform

Drag or tap a layer to toggle