Step 1
Beginner
4 min

Introduction to Bruin

Learn what Bruin is, how it replaces five separate tools with one platform, and get an overview of the NYC taxi pipeline we'll build.

Bruin CLI
Learning paths:Data Engineer

Video

What is Bruin?

Bruin is an end-to-end data platform that combines ingestion, transformation, orchestration, data quality checks, metadata management, and lineage into a single tool. Instead of configuring five or six different tools (an ingestion tool, dbt, Airflow, Great Expectations, etc.), you use one CLI and one project structure.

The modern data stack problem

A typical data workflow involves:

  • Extraction/Ingestion - pulling data from APIs, databases, and third-party sources
  • Transformation - cleaning, joining, and aggregating data with SQL or Python
  • Orchestration - scheduling and coordinating when each step runs
  • Quality checks - validating data accuracy, completeness, and consistency
  • Metadata and lineage - tracking what data exists, where it came from, and how it flows

Each of these traditionally requires a separate tool with its own configuration, deployment, and maintenance. Bruin brings them all into a single project.

What we'll build

In this tutorial series, we'll extract data from the NYC taxi trip database and build a three-layer pipeline:

  1. Ingestion layer - pull raw trip data and lookup tables into DuckDB
  2. Staging layer - clean, deduplicate, and join the data
  3. Reports layer - aggregate into final analytical tables

Learning goals

  • Bruin project structure (projects, pipelines, assets)
  • Materialization strategies (append, table, time interval)
  • Asset dependencies and lineage
  • Metadata and quality checks
  • Custom variables for parameterized runs