Back to Showcase

GitHub Tech Trends Pipeline

Amar Agrawal
pythonsqlbruinbigquerylooker-studiogithub-actions

An end-to-end batch pipeline that tracks GitHub developer activity trends across 16 months of data (~350M events) — from ingestion all the way to a live Looker Studio dashboard. Features Python assets that download GitHub Archive hourly data to GCS, idempotent delete-insert loader to BigQuery, SQL transformation assets with merge strategy, partitioning, and clustering, built-in quality checks on the staging layer, full asset lineage out of the box, and a daily schedule via Bruin pipeline. The project also includes a parallel implementation with Kestra+dbt for comparison, demonstrating how Bruin consolidates multiple tools into one CLI.

Share: