Ingest data
from anywhere

Ingest data from any source into your data lake or data warehouse, with no code required. Extend if needed with custom code.

Trusted by forward-thinking teams

// live pipeline

From YAML to running pipeline

Define the source and destination in a few lines of YAML. Bruin connects, extracts, validates, and loads with live status, timing, and row counts.

bruin run --stage ingestion

live

◆raw.users.yaml6 lines

1name: raw.users2type: ingestr3parameters:4  source_connection: stripe▊5  source_table: 'public.users'6  destination: bigquery▊

sources

$stripe12.4k

◉postgres84.1k

◎mongodb3.2k

≋kafka9.8k

ingestr

→ 2.4k/s

destinations

▣bigquery✓

❄snowflake✓

▲databricks✓

[⟳]loadbigquery · raw.stripe.charges▊

// stage.connectors

Built-in connectors, defined with YAML

Bruin is a code-based platform, meaning that everything you do comes from a Git repo, versioned. All of the data ingestions are defined in code, version controlled in your repo.

Multiple platforms: Bruin supports quite a few platforms as built-in connectors. You can ingest data from AWS, Azure, GCP, Snowflake, Notion, and more.
Built on open-source: Bruin's ingestion engine is built on ingestr, an open-source data ingestion tool.
Custom sources & destinations: Bruin supports pure Python executions, enabling you to build your own data ingestion code.
Incremental loading: Bruin supports incremental loading, meaning that you can ingest only the new data, not the entire dataset every time.

// stage.quality

End-to-end quality in raw data

Bruin's built-in data quality capabilities are designed to ensure that the data you ingest is of the highest quality and always matches with your expectations.

Built-in quality checks check.01: Bruin supports built-in quality checks, such as not_null, accepted_values, and more, all ready to be used in all assets.
Custom quality checks check.02: Bruin allows you to define custom quality checks in SQL, enabling you to define your own quality standards.
Templating in quality checks check.03: Bruin supports templating in quality checks, meaning that you can use variables in your checks, and run checks only for incremental periods.
Automated alerting check.04: Failing quality checks will automatically send alerts to the configured channels, ensuring that you are always aware of the data quality issues.

raw.users.yaml

name: raw.users
type: ingestr

parameters:
  source_connection: postgresql
  source_table: 'public.users'
  destination: bigquery

columns:

  # Define columns along with their quality checks
  - name: status
    checks:
      - name: not_null
      - name: accepted_values
        values:
          - active
          - inactive
          - deleted

# You can also define custom quality checks in SQL        
custom_checks:
  - name: new user count is greater than 1000
    query: |
      SELECT COUNT(*) > 1000 
      FROM raw.users 
      WHERE status = 'active' 
        AND created_at BETWEEN "{{start_date}}" AND "{{end_date}}"