Introduction to Bruin

Video

What is Bruin?

Bruin is an end-to-end data platform that combines ingestion, transformation, orchestration, data quality checks, metadata management, and lineage into a single tool. Instead of configuring five or six different tools (an ingestion tool, dbt, Airflow, Great Expectations, etc.), you use one CLI and one project structure.

The modern data stack problem

A typical data workflow involves:

Extraction/Ingestion - pulling data from APIs, databases, and third-party sources
Transformation - cleaning, joining, and aggregating data with SQL or Python
Orchestration - scheduling and coordinating when each step runs
Quality checks - validating data accuracy, completeness, and consistency
Metadata and lineage - tracking what data exists, where it came from, and how it flows

Each of these traditionally requires a separate tool with its own configuration, deployment, and maintenance. Bruin brings them all into a single project.

What we'll build

In this tutorial series, we'll extract data from the NYC taxi trip database and build a three-layer pipeline:

Ingestion layer - pull raw trip data and lookup tables into DuckDB
Staging layer - clean, deduplicate, and join the data
Reports layer - aggregate into final analytical tables

Learning goals

Bruin project structure (projects, pipelines, assets)
Materialization strategies (append, table, time interval)
Asset dependencies and lineage
Metadata and quality checks
Custom variables for parameterized runs