Bruin Academy
End-to-End Pipeline: NYC Taxi
Build a complete data pipeline from scratch using real NYC taxi data - from ingestion to staging to reports, all orchestrated with Bruin and DuckDB.
What is this?
A hands-on tutorial where you build a real data pipeline end-to-end using NYC taxi trip data. You'll go from raw API data to clean, aggregated reports - learning ingestion, transformation, quality checks, and AI-assisted development along the way.
What you'll use: Bruin CLI for pipeline orchestration, DuckDB as a local data warehouse, Python for ingestion, SQL for transformations, and the Bruin MCP with an AI agent to accelerate development.
What you'll build
- End-to-end ELT pipeline - Python ingestion from the NYC TLC API, seed files for lookup tables, SQL staging and reporting layers with quality checks and materialization
- Orchestration and lineage - A fully orchestrated pipeline with dependencies between assets, automatic execution order, and visual lineage
- AI integration - Use
bruin ai enhanceto build a data context layer, set up the Bruin MCP, and let an AI agent assist with pipeline development and data analysis
Before you start
- Bruin CLI installed
- VS Code or Cursor with the Bruin extension
- Familiarity with Bruin Core Concepts (recommended)