Bruin Academy
Tutorial module
End-to-End Pipeline: NYC Taxi
Build a complete data pipeline from scratch using real NYC taxi data - from ingestion to staging to reports, all orchestrated with Bruin and DuckDB.
Build a real data pipeline end-to-end using NYC taxi trip data. Go from raw API data to clean, aggregated reports - learning ingestion, transformation, quality checks, and AI-assisted development along the way.
What
- End-to-end ELT pipeline: Python ingestion, SQL staging, reporting layers with quality checks
- Full orchestration with dependency management, execution order, and visual lineage
- AI integration via
bruin ai enhanceand Bruin MCP
How
- Bruin CLI orchestrates the pipeline; DuckDB serves as the local data warehouse
- Python assets ingest from the NYC TLC API; SQL assets handle transformations
- Bruin MCP connects an AI agent for pipeline development and data analysis
Before you start
- Bruin CLI installed
- VS Code or Cursor with the Bruin extension
- Familiarity with Bruin Core Concepts (recommended)
Tutorial steps
- 1Introduction to Bruin4 min
- 2Install Bruin & Create Your First Pipeline10 min
- 3Build the NYC Taxi Pipeline16 min
- 4AI-Assisted Development and Analysis12 min
Resources
Get help & contribute