NYC Citi Bike Data Pipeline

Haowei Ting

This project is an end-to-end data engineering pipeline built around NYC Citi Bike trip data. It ingests monthly trip records from the public source, stores them in Google Cloud Storage, transforms them in BigQuery, and publishes the results through an interactive Plotly Dash dashboard. The goal is to turn raw trip-level data into clean, analysis-ready tables and visual insights about ridership patterns, bike type usage, trip duration, distance, and seasonality. The project uses several Bruin features to manage the workflow: - Asset definitions and dependencies to organize the raw, staging, and report layers. - Connection configuration for BigQuery through a named Bruin connection. - Materialization strategies including incremental `delete+insert` and full rebuild `create+replace`. - Built-in data quality checks such as `not_null` and `accepted_values` to improve reliability and consistency.

View on GitHub View Slack Thread

Certificate of Completion

Saves as a high-resolution PNG image.