Bruin Academy

End-to-End Pipeline: NYC Taxi

Build a complete data pipeline from scratch using real NYC taxi data - from ingestion to staging to reports, all orchestrated with Bruin and DuckDB.

What is this?

A hands-on tutorial where you build a real data pipeline end-to-end using NYC taxi trip data. You'll go from raw API data to clean, aggregated reports - learning ingestion, transformation, quality checks, and AI-assisted development along the way.

What you'll use: Bruin CLI for pipeline orchestration, DuckDB as a local data warehouse, Python for ingestion, SQL for transformations, and the Bruin MCP with an AI agent to accelerate development.

What you'll build

  1. End-to-end ELT pipeline - Python ingestion from the NYC TLC API, seed files for lookup tables, SQL staging and reporting layers with quality checks and materialization
  2. Orchestration and lineage - A fully orchestrated pipeline with dependencies between assets, automatic execution order, and visual lineage
  3. AI integration - Use bruin ai enhance to build a data context layer, set up the Bruin MCP, and let an AI agent assist with pipeline development and data analysis

Before you start