Guide

End-to-End Pipeline: NYC Taxi

Build a complete data pipeline from scratch using real NYC taxi data - from ingestion to staging to reports, all orchestrated with Bruin and DuckDB.

What

Build a real data pipeline end-to-end using NYC taxi trip data. Go from raw API data to clean, aggregated reports - learning ingestion, transformation, quality checks, and AI-assisted development along the way.

End-to-end ELT pipeline: Python ingestion, SQL staging, reporting layers with quality checks
Full orchestration with dependency management, execution order, and visual lineage
AI integration via bruin ai enhance and Bruin MCP

How

Bruin CLI orchestrates the pipeline; DuckDB serves as the local data warehouse
Python assets ingest from the NYC TLC API; SQL assets handle transformations
Bruin MCP connects an AI agent for pipeline development and data analysis

Before you start

Bruin CLI installed
VS Code or Cursor with the Bruin extension
Familiarity with Bruin Core Concepts (recommended)

Guide overview

Start guide

Resources

NYC Taxi guide (video playlist)Bruin CLI documentation Materialization strategies Bruin MCP setup

Get help & contribute

Join Slack GitHub Docs