Bruin CLI| SQL & Python pipelines GitHub

Your last
data platform.

Reliable data.
10x faster, 90% less complexity.

curl -LsSf https://getbruin.com/install/cli | sh

Join our Slack community

Trusted by forward-thinking teams

OPEN SOURCE

Open-source ELT / ETL tool

Bruin is a command-line data tool that lets you build SQL & Python pipelines with built-in quality checks, column-level lineage, and end-to-end observability.

View on GitHub

No vendor lock-in.: Self-host on your infrastructure or use our managed cloud. You're always in control.
Community-driven.: Join a growing community of contributors building the future of data platforms. Join our Slack .
Fully extensible.: Modify, extend, and customize to fit your exact needs. All source code is available.
Built-in data ingestion.: Copy data from any source to any destination with CLI-based ingestion tool.
MIT licensed.: Use commercially, modify freely, and distribute without restrictions.
Enterprise support available.: Get professional support, training, and custom features when you need them.

INGEST

Copy data from anywhere

CLI-based ingestion tool built on open-source. Batch and incremental loading. Connect any source to any destination with a single command.

Multiple sources & destinations.: Postgres, MySQL, BigQuery, Snowflake, S3, and more.
Incremental loading.: Efficient data syncing with snapshot and incremental modes.
Schema evolution.: Automatically update schemas in the destination to match the source.

View ingestr on GitHub

name: raw.users
type: ingestr
parameters:
  source_connection: postgres
  source_table: 'public.users'
  destination: bigquery

PostgreSQL

Syncing

BigQuery

TRANSFORM

Pipelines you can trust

Build data pipelines directly from SQL and Python files. Git-native workflows with dependency graphs and environment-aware runs.

SQL & Python native.: Write transformations in the language you prefer.
Column-level lineage.: Automatically extract the lineage of your transformations.
Jinja templating.: Parameterize your pipelines with Jinja templating.

View Bruin CLI on GitHub

/* @bruin

name: dashboard.bookings
owner: [email protected]
materialization:
  type: table

@bruin */

SELECT
    bookings.Id AS BookingId,
    sessions.Name AS SessionName,
    bookings.SessionType AS SessionType
FROM raw.Bookings AS bookings
INNER JOIN raw.Sessions AS sessions
  ON bookings.SessionId = sessions.Id
WHERE updated_at BETWEEN '{{ start_date }}' AND '{{ end_date }}'

Ingesting...

QUALITY

Data quality built-in

Define quality checks alongside your pipelines. Catch issues before they reach production with comprehensive validation rules.

Schema validation.: Column types, primary keys, and accepted values.
Automated alerts.: Get automated alerts if quality checks fail via Slack, email, and more.
Custom SQL checks.: Define business rules with SQL assertions.

Learn more about quality checks

columns:
  - name: SessionType
    type: STRING
    description: Type of training session
    checks:
      - name: not_null
      - name: accepted_values
        value:
          - 'Group'
          - 'Private'

Running quality checks...

SessionType

not_null

accepted_values

Group

pending

Private

pending

Unknown

pending

Group

pending

NULL

pending

TRUSTED BY DATA TEAMS

Ship faster with Bruin

Bruin solved many pain points for my team, making work not just more enjoyable but significantly more productive as well.

Moritz Schöne

Head of Data Science at Lessmore

Tolga Tolgay
Head of Product at Kyoso Interactive

Alperen Yildirim
Data Engineer at Workhy

Joshua Hemmerich
Managing Director at Lessmore

Arsalan Noorafkan
Team Lead, Data Engineering at Buluttan

Kadir Danisman
Co-founder & CEO at Spektra Games

Robert Schmier
Data Scientist at Lessmore

ARTIFICIAL INTELLIGENCE

AI-powered data platform

Natural language interface to your entire data ecosystem. Ask questions, get documentation, understand lineage.

••

Chat with your metadata.: Ask questions about schemas, lineage, and business logic. AI understands your catalog and governance context.
Auto-generated docs.: AI automatically documents schemas, business logic, and maintains a living knowledge base.
Smart lineage mapping.: Understand data flow and dependencies with AI-powered lineage visualization and impact analysis.
MCP server integration.: Connect Claude Desktop and other AI assistants directly to your data platform via Model Context Protocol.
Secure context.: Role-based access controls ensure AI only sees data you're authorized to access.
Fully opt-in.: Choose to enable AI for your team depending on your needs, no forced adoption.

OBSERVABILITY

Cost & performance insights

Track cloud warehouse costs at the asset level. Monitor performance metrics and optimize resource usage across your data stack.

Asset-level tracking.: Cost attribution per table, query, and pipeline.
Usage analytics.: Understand data access patterns and optimization opportunities.
Budget alerts.: Set thresholds and get notified before overruns.

PYTHON

ML & AI workloads

Run machine learning and AI workloads directly in your pipelines. Native Python support within isolated, serverless environments.

Native Python support.: Use pandas, scikit-learn, TensorFlow, and any Python library.
Automated dependencies.: All your execution dependencies are automatically installed in isolated environments.
Run in your pipelines.: Mix and match ML models with your data pipelines.

View Python examples

""" @bruin

name: dashboard.ltv_prediction
image: python:3.12
instance: b2.xlarge

@bruin """

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from src.utils import fetch_customers_df

df = fetch_customers_df()

# Feature engineering
X = df[['historical_spend', 'tenure_months', 'purchases']]
y = df['lifetime_value']

# Train model
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

model = LinearRegression()
model.fit(X_train, y_train)

GOVERNANCE

Catalog, lineage, and control

Automatic lineage generation across platforms. Data catalog with glossary and metadata. Track dependencies from source to dashboard.

Lineage

End-to-end visibility

Track data flow from source systems through transformations to final dashboards. Understand impact before making changes, on a column level.

Tables

All

Name

Type

Description

Owner

users.profiles

table

Customer profiles

@data-team

Executive Dashboard

tableau

C-suite KPIs

@bi-team

daily_revenue_agg

pipeline

Revenue aggregation

@finance

staging.events

staging

Raw events stream

@analytics

Catalog

Central metadata registry

Document tables, columns, and pipelines. Business glossary with ownership and descriptions for every asset.

Risk Score

Last run: 2 min ago

92/ 100

15 safe

2 warnings

Impact

Identify risks

Automated impact analysis before changes. Know what breaks before deployment.

Permissions

pipeline.read

viewer

pipeline.execute

editor

users.manage

admin

Control

Role-based access

Fine-grained permissions. Control who can view, edit, and execute.

analytics

ml-models

finance

core-data

bruin

Multi-repo

Distributed workflows

Connect pipelines across multiple repositories. Works with GitHub, GitLab, Bitbucket and more.

PLATFORMS

Bring your own cloud

You can use our fully managed cloud, or bring your own cloud if you'd prefer. Run in your VPC or on-premises.

Fully managed

Bruin Cloud

Private VPC

Your AWS/GCP/Azure

On‑premises

Self‑hosted

Ready to ship reliable data?

Production-ready pipelines without the complexity. Deploy today.

Your last
data platform.

OPEN SOURCE

INGEST

TRANSFORM

QUALITY

TRUSTED BY DATA TEAMS

ARTIFICIAL INTELLIGENCE

OBSERVABILITY

PYTHON

GOVERNANCE

Lineage

Catalog

Impact

Control

Multi-repo

SECURITY & COMPLIANCE

Role-Based Access

Audit Logs

Single Sign-On

Encryption

Private Links

Data Residency

Access Controls

Two-Factor Auth

PLATFORMS

Fully managed

Private VPC

On‑premises

Ready to ship reliable data?

Your last data platform.

OPEN SOURCE

INGEST

TRANSFORM

QUALITY

TRUSTED BY DATA TEAMS

ARTIFICIAL INTELLIGENCE

OBSERVABILITY

PYTHON

GOVERNANCE

Lineage

Catalog

Impact

Control

Multi-repo

SECURITY & COMPLIANCE

Role-Based Access

Audit Logs

Single Sign-On

Encryption

Private Links

Data Residency

Access Controls

Two-Factor Auth

PLATFORMS

Fully managed

Private VPC

On‑premises

Ready to ship reliable data?

Your last
data platform.