Tutorial module

End-to-End Pipeline: NYC Taxi

Build a complete data pipeline from scratch using real NYC taxi data - from ingestion to staging to reports, all orchestrated with Bruin and DuckDB.

Build a real data pipeline end-to-end using NYC taxi trip data. Go from raw API data to clean, aggregated reports - learning ingestion, transformation, quality checks, and AI-assisted development along the way.

What

End-to-end ELT pipeline: Python ingestion, SQL staging, reporting layers with quality checks
Full orchestration with dependency management, execution order, and visual lineage
AI integration via bruin ai enhance and Bruin MCP

How

Bruin CLI orchestrates the pipeline; DuckDB serves as the local data warehouse
Python assets ingest from the NYC TLC API; SQL assets handle transformations
Bruin MCP connects an AI agent for pipeline development and data analysis

Before you start

Bruin CLI installed
VS Code or Cursor with the Bruin extension
Familiarity with Bruin Core Concepts (recommended)

Tutorial steps

1Introduction to Bruin4 min
2Install Bruin & Create Your First Pipeline10 min
3Build the NYC Taxi Pipeline16 min
4AI-Assisted Development and Analysis12 min

Resources

NYC Taxi tutorial (video playlist)Bruin CLI documentation Materialization strategies Bruin MCP setup

Get help & contribute

Join Slack GitHub Docs