Open Source · Apache 2.0 & MIT Licensed

Open-source
data tools.

Developer-first CLIs for ingestion and pipelines. Build locally, run anywhere.

Join Slack Community

Bruin CLI on GitHub Ingestr on GitHub

Data Ingestion

Multi-Language

Pipeline Orchestration

Quality Checks

Bruin MCP

AI Development

PIPELINES & LINEAGE

Build end-to-end pipelines

Transform your data using SQL, Python, or R. Bruin CLI automatically extracts dependencies and column-level lineage from your code, building a complete view of your data flow.

Multi-language support.: SQL, Python, and custom scripts all in one pipeline.
Automatic dependency resolution.: No manual DAG configuration—dependencies extracted from code.
Column-level lineage.: Track data from source to destination at the column level.

View on GitHub Learn about Lineage

/* @bruin

name: dashboard.bookings
owner: [email protected]
materialization:
  type: table

@bruin */

SELECT
    bookings.Id AS BookingId,
    sessions.Name AS SessionName,
    bookings.SessionType AS SessionType
FROM raw.Bookings AS bookings
INNER JOIN raw.Sessions AS sessions
  ON bookings.SessionId = sessions.Id
WHERE updated_at BETWEEN '{{ start_date }}' AND '{{ end_date }}'

DATA INGESTION

Move data from any source

Copy data between databases, apps, and data warehouses with a single command. Ingestr automatically handles data updates and keeps everything in sync.

Multiple sources & destinations.: Postgres, MySQL, MongoDB, BigQuery, Snowflake, Shopify, Stripe, Salesforce, and more.
Incremental loading.: Efficient data syncing with snapshot, incremental, and CDC patterns.
Schema evolution.: Automatically update schemas in the destination to match the source.
CLI-friendly & scriptable.: Use in bash scripts, CI pipelines, or integrate with Bruin CLI workflows.

View on GitHub View all sources

name: raw.users
type: ingestr
parameters:
  source_connection: postgres
  source_table: 'public.users'
  destination: bigquery

PostgreSQL

Syncing

BigQuery

ORCHESTRATION

Run pipelines on schedule

Define pipeline schedules, variables, and connections in YAML. Set up cron expressions, type-safe parameters, and environment-specific configurations—all in one file.

Flexible scheduling.: Daily, hourly, or custom cron expressions for automated pipeline runs.
Typed variables.: Define variables with type validation, enums, and default values.
Default connections.: Centralize connection configs and reference them across all assets.

View on GitHub Read the Docs

name: analytics-daily
schedule: daily
start_date: "2024-01-01"

default_connections:
  snowflake: "sf-default"
  postgres: "pg-default"
  slack: "alerts-slack"

tags: [ "daily", "analytics" ]
domains: [ "marketing" ]

default:
  interval_modifiers:
    start: "-1d"
    end: "-1d"

variables:
  target_segment:
    type: string
    enum: ["self_serve", "enterprise", "partner"]
    default: "enterprise"
  channel_overrides:
    type: object
    properties:
      email:
        type: array
        items:
          type: string
    default:
      email: ["enterprise_newsletter"]

QUALITY CHECKS

Catch issues before production

Define data quality checks alongside your transformations. Use built-in validators or write custom SQL checks for business rules. Tests run automatically and fail fast when expectations aren't met.

Built-in checks.: Pre-built checks for common patterns: not_null, unique, accepted_values, and more.
Custom SQL validators.: Write your own validation logic in SQL for complex business rules.
Column-level tests.: Define checks at the column level to ensure data quality where it matters.

View on GitHub Learn about Quality

name: raw.users
type: ingestr

parameters:
  source_connection: postgresql
  source_table: 'public.users'
  destination: bigquery

columns:

  # Define columns along with their quality checks
  - name: status
    checks:
      - name: not_null
      - name: accepted_values
        values:
          - active
          - inactive
          - deleted

# You can also define custom quality checks in SQL
custom_checks:
  - name: new user count is greater than 1000
    query: |
      SELECT COUNT(*) > 1000
      FROM raw.users
      WHERE status = 'active'
        AND created_at BETWEEN "{{start_date}}" AND "{{end_date}}"

GET STARTED

Install in seconds

Both tools are available on macOS, Linux, and Windows. Install via package managers or download directly from GitHub.

Bruin CLI

Bruin CLI is the tool for building data pipelines and transformations in SQL & Python

curl:

curl -LsSf https://getbruin.com/install/cli | sh

wget:

wget -qO- https://getbruin.com/install/cli | sh

View full installation guide

Ingestr

Ingestr is the tool for copying data between databases, apps, and warehouses

uv is the recommended way to install Ingestr:

uv tool install ingestr

Don't have uv? Install it first:

pip install uv

View more installation methods

Ready to get started?

Join our Slack community to connect with other users, get help, and stay updated with the latest developments.

Join Slack Community Star on GitHub

Open-source data tools.