Bruin Academy

Tutorial module

End-to-End Pipeline: NYC Taxi

Build a complete data pipeline from scratch using real NYC taxi data - from ingestion to staging to reports, all orchestrated with Bruin and DuckDB.

Build a real data pipeline end-to-end using NYC taxi trip data. Go from raw API data to clean, aggregated reports - learning ingestion, transformation, quality checks, and AI-assisted development along the way.

What

  • End-to-end ELT pipeline: Python ingestion, SQL staging, reporting layers with quality checks
  • Full orchestration with dependency management, execution order, and visual lineage
  • AI integration via bruin ai enhance and Bruin MCP

How

  • Bruin CLI orchestrates the pipeline; DuckDB serves as the local data warehouse
  • Python assets ingest from the NYC TLC API; SQL assets handle transformations
  • Bruin MCP connects an AI agent for pipeline development and data analysis

Before you start

Tutorial steps

  1. 1Introduction to Bruin4 min
  2. 2Install Bruin & Create Your First Pipeline10 min
  3. 3Build the NYC Taxi Pipeline16 min
  4. 4AI-Assisted Development and Analysis12 min

Get help & contribute