Community Project Showcase

Browse projects built with Bruin. Explore real-world pipelines, ingestion workflows, and analytics solutions from the community.

5 Projects

NZ Electricity Generation Pipeline

Xinglin Gao

This project tracks New Zealand's electricity generation mix across 8 years (2018–2026), pulling monthly CSVs from the Electricity Authority's public API through a three-layer transformation pipeline (staging, core, mart) that feeds a Looker Studio dashboard, revealing an ~85% renewable grid driven mostly by hydro. The pipeline is built entirely with Bruin, an open-source CLI tool that replaces the usual Airflow + dbt + Great Expectations stack with a single binary: SQL and Python assets coexist in the same pipeline with automatic dependency resolution, incremental materialisation, and quality checks embedded directly in asset definitions rather than maintained as a separate test suite. That "one tool, one config format" design meant I could focus on the data logic, unpivoting 50 trading-period columns, deduplicating records, and building partitioned and clustered fact tables, rather than writing glue code between tools.

pythonsqlbruin

EDGAR Daily Hub

Kate Chen

EDGAR Daily Hub is a full-stack data platform that brings transparency to SEC filing activity. By automatically ingesting the EDGAR daily index every business day, it tracks filing volumes across all form types and flags unusual spikes — like surges in insider ownership disclosures — through a clean, interactive dashboard. Users can build a personal watchlist of stock tickers to monitor filings for companies they care about, turning a tedious manual research process into a seamless daily workflow. Built with React/TypeScript, Python FastAPI, and MotherDuck as the analytical data warehouse, with Supabase handling auth and user data. The pipeline runs on a daily automated schedule via GitHub Actions and is deployed on Fly.io with Docker.

pythontypescriptreact

GitHub Activity Analytics Dashboard

Rui Pinto

GitHub generates millions of public events every day — pushes, pull requests, issues, forks, stars — across thousands of repositories and contributors worldwide. This raw activity stream is publicly available via gharchive.org, but it is not pre-aggregated or directly queryable in a useful analytical form. This project builds an end-to-end batch data pipeline that answers: - Which event types dominate GitHub activity on any given day or hour? - Which repositories attract the most contributors and drive the most events? - How does activity vary across the day (UTC), and what are the peak hours? - What is the daily mix of event types — is it push-heavy, or driven by issues and PRs? - Which programming language ecosystems (inferred from repo naming patterns) are most active?

pythondata-pipelinegithub-api

CryptoFlow Analytics

Ousmane CISSE

Crypto markets generate massive amounts of data across hundreds of exchanges, thousands of tokens, and multiple sentiment indicators. Individual investors and analysts face three core challenges: - Data fragmentation — Prices, volumes, sentiment, and trending data live in separate APIs with different formats - Signal noise — Raw price changes alone are misleading without context (volume confirmation, market breadth, sentiment) - Regime blindness — Most dashboards show what happened, but fail to classify where we are in the market cycle CryptoFlow Analytics solves this by building a unified intelligence layer that ingests, cleans, enriches, and analyzes crypto data to produce actionable signals - not just charts. Bruin Features Used: - Python Assets: 5 ingestion scripts fetching from CoinGecko, Alternative.me APIs, and CSV seed - SQL Assets: 9 BigQuery SQL transformations across staging (3) and analytics (6) layers - Seed Assets: CSV-based reference data for coin categories - Materialization: table strategy for all assets; merge for incremental ingestion - Dependencies: Explicit depends declarations creating a proper DAG - Quality Checks: Built-in (not_null, unique, positive, accepted_values) on every asset - Custom Checks: Business logic validations (e.g., "Bitcoin must exist in data", "dominances sum to ~100%") - Glossary: Structured business term definitions for crypto concepts - Pipeline Schedule: Daily schedule via pipeline.yml - Bruin Cloud: Deployment, monitoring, and AI analyst - AI Data Analyst: Conversational analysis on all analytics tables - Lineage: Full column-level lineage via bruin lineage

pythonsqlbruin

GitHub Repository Insights

Avanishchandra Yadav

A production-style data pipeline built with Bruin for the Bruin Zoomcamp challenge. This pipeline ingests GitHub repository metadata from the GitHub API, transforms the data through staging, and produces an analytics report — all orchestrated locally using DuckDB.

pythonbruinduckdb