Community Project Showcase

Browse projects built with Bruin. Explore real-world pipelines, ingestion workflows, and analytics solutions from the community.

New: live project reviews on May 20 & June 3

Submit Your Project View Leaderboard

59 Projects

CryptoFlow Analytics

Ousmane CISSE

Crypto markets generate massive amounts of data across hundreds of exchanges, thousands of tokens, and multiple sentiment indicators. Individual investors and analysts face three core challenges: - Data fragmentation — Prices, volumes, sentiment, and trending data live in separate APIs with different formats - Signal noise — Raw price changes alone are misleading without context (volume confirmation, market breadth, sentiment) - Regime blindness — Most dashboards show what happened, but fail to classify where we are in the market cycle CryptoFlow Analytics solves this by building a unified intelligence layer that ingests, cleans, enriches, and analyzes crypto data to produce actionable signals - not just charts. :hammer_and_wrench: Bruin Features Used - Python Assets: 5 ingestion scripts fetching from CoinGecko, <http://Alternative.me|Alternative.me> APIs, and CSV seed - SQL Assets: 9 BigQuery SQL transformations across staging (3) and analytics (6) layers - Seed Assets: CSV-based reference data for coin categories - Materialization: table strategy for all assets; merge for incremental ingestion - Dependencies: Explicit depends declarations creating a proper DAG - Quality Checks: Built-in (not_null, unique, positive, accepted_values) on every asset - Custom Checks: Business logic validations (e.g., "Bitcoin must exist in data", "dominances sum to ~100%") - Glossary: Structured business term definitions for crypto concepts - Pipeline Schedule: Daily schedule via pipeline.yml - Bruin Cloud: Deployment, monitoring, and AI analyst - AI Data Analyst: Conversational analysis on all analytics tables - Lineage: Full column-level lineage via bruin lineage

GitHub Repository Insights

Avanishchandra Yadav

A production-style data pipeline built with Bruin for the Bruin Zoomcamp challenge. This pipeline ingests GitHub repository metadata from the GitHub API, transforms the data through staging, and produces an analytics report — all orchestrated locally using DuckDB.

GitHub Activity Analytics Dashboard

Rui Pinto

GitHub generates millions of public events every day — pushes, pull requests, issues, forks, stars — across thousands of repositories and contributors worldwide. This raw activity stream is publicly available via <http://gharchive.org|gharchive.org>, but it is not pre-aggregated or directly queryable in a useful analytical form. This project builds an end-to-end batch data pipeline that answers: - Which event types dominate GitHub activity on any given day or hour? - Which repositories attract the most contributors and drive the most events? - How does activity vary across the day (UTC), and what are the peak hours? - What is the daily mix of event types — is it push-heavy, or driven by issues and PRs? - Which programming language ecosystems (inferred from repo naming patterns) are most active?

edgar daily hub

Kate Chen

EDGAR Daily Hub is a full-stack data platform that brings transparency to SEC filing activity. By automatically ingesting the EDGAR daily index every business day, it tracks filing volumes across all form types and flags unusual spikes — like surges in insider ownership disclosures — through a clean, interactive dashboard. Users can build a personal watchlist of stock tickers to monitor filings for companies they care about, turning a tedious manual research process into a seamless daily workflow. Built with React/TypeScript, Python FastAPI, and MotherDuck as the analytical data warehouse, with Supabase handling auth and user data. The pipeline runs on a daily automated schedule via GitHub Actions and is deployed on <http://Fly.io|Fly.io> with Docker.

NZ Electricity Generation Pipeline

Xinglin Gao

This project tracks New Zealand's electricity generation mix across 8 years (2018–2026), pulling monthly CSVs from the Electricity Authority's public API through a three-layer transformation pipeline (staging, core, mart) that feeds a Looker Studio dashboard, revealing an ~85% renewable grid driven mostly by hydro. The pipeline is built entirely with Bruin, an open-source CLI tool that replaces the usual Airflow + dbt + Great Expectations stack with a single binary: SQL and Python assets coexist in the same pipeline with automatic dependency resolution, incremental materialisation, and quality checks embedded directly in asset definitions rather than maintained as a separate test suite. That "one tool, one config format" design meant I could focus on the data logic, unpivoting 50 trading-period columns, deduplicating records, and building partitioned and clustered fact tables, rather than writing glue code between tools.

Kickstarter Campaign Analytics Pipeline

Cedan Murat Zeynepli

Kickstarter campaign analytics pipeline that answers which factors (category, goal, country, staff pick, duration) best predict Kickstarter success. It ingests ~203k campaigns from Hugging Face (Parquet), lands data in Google Cloud Storage, loads and models it in BigQuery (raw → staging → analytics marts), and feeds a Looker Studio dashboard. Infrastructure (GCS, BigQuery) is provisioned with Terraform. Bruin orchestrates the workflow: GCP connection and environments in .bruin.yml, pipeline definition in pipeline.yml (variables), Python assets for download/load to GCS and BigQuery, BigQuery SQL assets with dependencies (depends), materialization (views/tables), column metadata and data quality checks in asset headers, and Jinja variables (var.*) in SQL.

Kenya Renewable Energy Data Pipeline

GEORGE OMONDI ODERO

This project builds a fully automated data pipeline that ingests Kenya's renewable energy data from four open sources — Ember Climate, IRENA, Our World in Data, and <http://EnergyData.info|EnergyData.info> — transforms it through a Bronze → Silver → Gold medallion architecture stored in a single DuckDB file, and serves it as a live Evidence.dev dashboard deployed via GitHub Actions to GitHub Pages. The pipeline tracks Kenya's progress toward its 2030 target of 100% renewable electricity across five analytical dimensions: generation mix, installed capacity, carbon intensity, electricity access, and geospatial grid infrastructure. It rebuilds automatically every day, requiring zero manual intervention to keep the data current. Bruin features used include; python assets, duckdb SQL assets, asset dependency graph, materialization strategies, built-in data quality checks, multi-language pipeline, CLI validation, single connection config, downstream execution, and checks only mode.

Project_Customer_Churn_Bank

Anastasija

Overview: An end-to-end ELT pipeline built to analyze ABC Bank's customer retention by integrating internal demographics with external 2022 Eurostat market benchmarks. The project focuses on identifying "Premium Segment" churn, discovering that 80% of churned customers are high-earners. How I used Bruin: I leveraged Bruin to move away from traditional script-based workflows to a modern, declarative asset-based architecture: Infrastructure as Code (IaC): Every BigQuery table and view was defined as a Bruin asset, including physical layer optimizations like Clustering on country and gender to boost query performance. Data Lineage & Dependencies: Using Bruin's DAG capabilities, I ensured a clean Medallion-like flow: Staging (cleaning) :arrow_right: Reference (Eurostat data) :arrow_right: Fact (Salary benchmarking logic). Automated Data Quality: I integrated built-in quality checks (not_null, unique) directly into the asset definitions, ensuring that only validated data reached my Looker Studio dashboard. Seamless Deployment: Bruin managed the entire lifecycle from Kaggle ingestion through GCS to BigQuery materialization with a single --force execution command. Key Findings: The pipeline revealed that churned customers earn an average of 5,048 EUR MORE than the national benchmark, and identified a 70.45% churn rate among the 46-60 age group in Germany, providing the bank with clear targets for retention programs.

Github Trends

Amar Agrawal

End-to-end data pipeline tracking GitHub developer activity trends using GCP, Bruin, dbt and Looker Studio

Jobs Analytics Project

Alexander Panas

Belarus IT Job Market Analytics — a batch pipeline that tracks Data Engineer, Data Analyst and Data Scientist vacancies on <http://rabota.by|rabota.by> (powered by <http://HH.ru|HH.ru> API). Monitors vacancy dynamics over time, required experience levels, and top-5 in-demand skills per role.

Closer-Every-Year

Luca Barsottini

A batch pipeline tracking **gender gap indicators and relationship trends** (marriage, divorce, age at first marriage, pay gap) across European countries from 2005 to 2024, all sourced from the Eurostat API. **Bruin features I used:** • `type: python` assets for ingestion (Eurostat API → Parquet on GCS) • `type: bq.sql` and `type: duckdb.sql` assets for staging + analytics SQL transformations • `strategy: merge` on `(country, year)` — fully idempotent, no duplicate runs • Dependency resolution via asset references — no manual DAG wiring • Dual-environment setup: `local-pipeline` runs on DuckDB, `gcp-pipeline` runs on BigQuery — **same asset code, different connections** • `bruin run --environment cloud` for the GCP pipeline, `--workers 1` for local (DuckDB doesn't support concurrent writes) • Docker-based setup with the Bruin container + Terraform container side by side

Thalassa-Analytics

Dimitris Zacharenakis

Thalassa is a production-style batch data engineering project for Greek maritime traffic analytics. It uses Bruin to orchestrate a fully scheduled pipeline that ingests public sailing traffic data from the <http://data.gov.gr|data.gov.gr> sailing_traffic API, lands raw records in BigQuery, transforms them into curated analytics tables, and serves the results through a Streamlit dashboard covering operational KPIs, route patterns, and port analysis.

GitHub Activity Dashboard

Alik Grigoryan

GitHub Activity Dashboard transforms raw GitHub event data into structured tables and generates interactive dashboards showing repository engagement and hourly activity. Using Bruin, the pipeline demonstrates end-to-end orchestration, incremental staging, and daily scheduling.

Automated Ads Reporting Suite

Fajar Aji Pamungkas

A fully automated data pipeline that consolidates, transforms, and visualizes digital advertising performance data across multiple platforms (Google Ads, Meta Ads, TikTok Ads, LinkedIn Ads).

Trading Helper Pipeline

Łukasz Kwaśniewski

A daily ETL pipeline that fetches end-of-day OHLCV data for QQQ, NQ futures, VIX, and VVIX from yfinance, stores it in a local DuckDB database, and serves an interactive Streamlit dashboard for pre-market analysis.

Olist Ecommerce Analytics Pipeline

Daniswara Aditya Putra

This cloud-native ELT pipeline transforms 100k+ Brazilian e-commerce orders into actionable logistics and customer satisfaction insights by migrating raw data from Kaggle into Google BigQuery using a multi-layered Medallion architecture (Raw → Staging → Mart). Powered by Bruin, the project utilizes Python-to-Cloud Ingestion for automated downloads, Native SQL Modeling for scalable transformations, and Automated Data Quality Checks (e.g., unique, not_null, positive_value) to ensure end-to-end data integrity for its interactive visualization dashboard.

Olist E-Commerce Analytics Pipeline

Daniswara Aditya Putra

min-fin

Alex Benasutti

A minimalist, metadata-based on-budget finance tracker using Databricks and bruin. Bruin handles the ETL pipelines while Databricks hosts the data/schema/dashboard.

US Flights Data Engineering Project (2015)

Adilbek Bulatov

:memo: Problem Description The aviation industry generates massive amounts of data daily. This project analyzes a dataset of 5.8 million flights in the US from 2015 to identify patterns in delays and cancellations. The goal is to provide actionable insights for operational management through a robust data pipeline and interactive dashboards. Source: <https://www.kaggle.com/datasets/usdot/flight-delays> Key Questions Addressed: Punctuality: Which airlines and airports are the most/least punctual (OTP)? Correlation: How do flight distance and time of day affect the probability of delay? Seasonality: What are the seasonal trends in flight reliability? :building_construction: Project Architecture Since the official DOT Bureau of Transportation Statistics does not provide a public API, the data is sourced from Kaggle. The project follows a modern ELT (Extract, Load, Transform) approach using the Medallion Architecture (Bronze, Silver, Gold layers), moving data from raw CSVs to structured analytical reports. :hammer_and_wrench: Technologies & Infrastructure Cloud: Storage (Google Cloud Storage, GCS), Data Warehouse (Google BigQuery) Infrastructure: Batch Processing Architecture Workflow Orchestration & Transformation: Bruin Language: Python (Ingestion), SQL (Transformations) Data Visualization: Power BI (Desktop & Service)

Chicago Crash data analysis

Vijay Agnihotri

This project is an end-to-end data pipeline that ingests, transforms, and stores Chicago traffic crash data for analysis. It pulls from three public datasets on the City of Chicago Data Portal — Crashes, People, and Vehicles — via the SODA2 REST API, loads the raw records into a local DuckDB database, and then joins them into a single analytics-ready table. The pipeline runs on a daily schedule, fetching only new records each run by filtering the API on the Bruin-provided start and end dates, so it avoids re-downloading the entire dataset (~1M+ crash records) every time. The pipeline is built with Bruin, an open-source data pipeline framework. It uses Python assets with automatic dependency management via uv and pyproject.toml to handle API ingestion, materialization with the merge strategy on the People and Vehicles assets to upsert records by primary key and append on Crashes, and DuckDB as a first-class connection for serverless local analytics. Asset dependencies (depends) enforce sequential execution to avoid DuckDB write-lock conflicts, while the SQL asset leverages Bruin's create+replace materialization to rebuild the merged table from the three raw sources on each run. The daily schedule and built-in environment variables (BRUIN_START_DATE, BRUIN_END_DATE) enable incremental loading out of the box.

Skypulse Streaming Pipeline

Alexander Daniel Rios

SkyPulse is an end-to-end real-time data pipeline that integrates heterogeneous global streams—flight positions, weather conditions, and seismic events—into a unified operational view of airspace. It continuously ingests data from OpenSky, Open-Meteo, and USGS, processes it through Redpanda and Apache Flink using 5-minute tumbling windows, and stores it in a Supabase landing zone. The core strength of the system lies in its use of Bruin, which structures and transforms raw data into a layered analytical model (staging → intermediate → marts), enabling consistent, cross-stream enrichment. This allows the generation of composite geospatial risk scores per grid cell, combining aircraft density, seismic activity, and weather conditions, all visualized in near real-time through an interactive Streamlit dashboard

Swiss Traffic Pipeline

Daniel Alugwe

Swiss Traffic Pipeline is a Bruin-orchestrated data pipeline that ingests, stages, and transforms the Swiss Federal Roads Office (FEDRO) 2025 annual traffic bulletin into a Looker Studio dashboard

Chat Analysis

Zoltan Nyiri

The goal of this project was to understand how users interact with a chatbot—analyzing patterns, behaviors, and engagement to uncover meaningful insights.

Global Urbanization & Mobility Intelligence Pipeline

HariKesava Perumal Bramadevan

Production-grade Bruin pipeline that models 126 years of urbanization, density, land expansion, and mobility pressure across 217 countries and areas. The project uses Bruin CLI for orchestration, PostgreSQL for storage, Grafana for visualization, and GitHub Actions for CI/CD.

Airflow-Studio

Pratik Makarand Kanade

A visual DAG builder for Apache Airflow. Drag, drop, and connect operators on a canvas generate valid, idiomatic Python DAG files in both Traditional and Taskflow API syntax.

Global Energy Transition Pipeline

Joseph Obi

This pipeline ingests, transforms, and analyses 125 years of global energy data (1900–2026) across 200+ countries to track the worldwide transition from fossil fuels to renewables. The project uncovered surprising stories like Denmark jumping from 15% to 91% renewable electricity since 2000, DR Congo and Ethiopia already running on 100% renewable power. The pipeline is built entirely with Bruin on DuckDB, with an interactive Evidence.dev dashboard for visualization and GitHub Actions for CI/CD. Bruin Features Used: - Seed assets: ingested the OWID(Our World in Data) CSV dataset directly into DuckDB - SQL assets: built 4 mart tables (global trends, country trends, renewables leaderboard, Africa energy spotlight) - Data quality checks: 54 automated checks across all assets — column-level (not_null, non_negative) and custom SQL checks - bruin ai enhance: AI-powered documentation, column descriptions, domain tags, and data-driven quality checks on all mart assets - pipeline.yml orchestration: daily schedule with dependency management - bruin lineage: full DAG visualisation of asset dependencies - bruin validate: integrated into GitHub Actions CI/CD workflow

Return Analysis

Yadhukrishna V S

Returns are expensive for ecommerce businesses because they reduce revenue, increase operational costs, and can reveal product or customer behavior issues. The goal of this project is to build an end-to-end data pipeline that answers questions such as: How do return rates change over time? Which product categories have the highest return rates? Which SKUs drive the most revenue loss from returns? Which customers return the most items? The output is a set of analytics tables in BigQuery and a dashboard in Looker Studio for reporting.

audio-trend-data-project

Osi Osman

This project aimed to explore content consumption patterns across music and podcasts on streaming platforms, and see if there were any trends amongst them.

research-intel-pipeline

Xi Feng

Real-time scientific research analytics pipeline for AI, ML, and Computational Biology. Ingests papers from arXiv and OpenAlex, transforms them with Bruin into a layered analytical model, and serves insights through an interactive Streamlit dashboard.

EV Market Intelligence Pipeline

Bigboy Mutichakwa

Built an end-to-end serverless data lakehouse on AWS to analyze global electric vehicle (EV) sales trends—all at near $0 cost. I used the Bruin CLI for orchestration to tie the stack together. Bruin allowed me to: 1. Unify Ingestion & Transformation: Sequence the Kaggle-to-S3 Python asset and the dbt-run asset within a single pipeline. 2. Environment Management: Seamlessly handle local DuckDB processing while targeting AWS S3 for the final "Gold" mart tables. 3. Speed: Achieve rapid development within a GitHub Codespaces environment using a lightweight CLI.

Afrofinance Pulse

Ayoade Abel Adegbite

A Bruin + DuckDB pipeline tracking Africa's fintech ecosystem, currency volatility, startup funding flows, and fintech readiness across 12 African countries.

UK Retail Analytics Pipeline

Alper Kocer

An end-to-end data engineering pipeline built with Bruin and DuckDB for the UK Online Retail II dataset (1,067,371 real transactions). The pipeline ingests raw Excel data through a Python asset, cleans it in a staging layer, and produces 5 analytical mart tables covering monthly revenue trends, product performance with return rate analysis, RFM customer segmentation, country analysis, and cancellation tracking. Features 66 automated quality checks across all 7 assets, GitHub Actions CI/CD, and AI analysis via the Bruin AI Data Analyst. Key finding: one product ranks 4th by revenue at £168K but has a 100% return rate.

Air Quality Around Italy SCAN

federico spatola

End-to-end batch pipeline to monitor air quality across Italy. Pulls real-time data from [AQICN](<https://aqicn.org>), stores it on **GCS**, transforms it with **dbt + DuckDB**, and visualizes it in an interactive **Streamlit** dashboard.

Hospital Facility Management BI Learning Project

Melek Ergin

Built a scenario-based BI app using NHS ERIC data + synthetic data to model hospital facility operations. Stack: DuckDB, Bruin, Streamlit Focus: • Maintenance backlog • Compliance • Sterile services • Patient flow Goal: bridge real-world operations with BI modeling.

Mexico Biomass Analytics: End-to-End Data Pipeline

Moisés Ariel Gutiérrez Espriella

This project solves a critical data logistics problem in Mexico's renewable energy sector. It combines fragmented agricultural waste data (SIAP) and existing infrastructure records (SEMARNAT) to identify the national biomass "Opportunity Gap." Designed with an EtLT architecture, the pipeline uses Terraform for IaC deployment on Google Cloud Platform. Bruin serves as the unified orchestrator handling data ingestion via Python, and heavy business transformations using SQL natively in BigQuery. The optimized data warehouse powers an interactive Looker Studio dashboard and integrates with Bruin Cloud's AI Data Analyst to provide actionable, natural-language insights for infrastructure investment. Tech Stack: Bruin, GCP (Cloud Storage, BigQuery), Terraform, Python, SQL, Looker Studio.

Global "Net Zero" Energy Transition Tracker

Wisnu anugrah pratama

An end-to-end, production-grade data platform designed to monitor and analyze the structural decoupling of economic growth from carbon intensity. The system processes 25 years of historical data (2000–2024) across 231 countries to track progress toward global Net Zero targets, with a specialized strategic deep dive into the ASEAN region and Indonesia.

:mag: OtakuLens

Varun Nayyar

A fully cloud-native, end-to-end data engineering pipeline that ingests metadata for 500+ anime titles from MyAnimeList, transforms it through a production-grade ELT process, and serves interactive analytics through a live Streamlit dashboard, including semantic anime recommendations powered by sentence embeddings.

U.S. Crude Oil Production Analytics Pipeline

Arsi Amallah Binhaq

The primary goal of this project is to build a robust and scalable data pipeline that tracks crude oil production across different U.S. states using publicly available data from the Energy Information Administration (EIA). The pipeline handles end-to-end data processing—from ingestion of raw datasets to transformation into analytics-ready tables—enabling analysis of production trends across regions and over time to support data-driven insights in the energy sector. Bruin Features Used This project leverages Bruin to simplify and manage the data pipeline using: Data Ingestion: Integrating raw EIA datasets directly into the data warehouse as part of the pipeline Data Transformation: SQL-based processing to clean, structure, and aggregate crude oil production data Pipeline Execution (bruin run): Running the full pipeline with automatic dependency handling Bruin AI: Assisting in SQL development and accelerating pipeline implementation

NYC Citi Bike Data Pipeline

Haowei Ting

This project is an end-to-end data engineering pipeline built around NYC Citi Bike trip data. It ingests monthly trip records from the public source, stores them in Google Cloud Storage, transforms them in BigQuery, and publishes the results through an interactive Plotly Dash dashboard. The goal is to turn raw trip-level data into clean, analysis-ready tables and visual insights about ridership patterns, bike type usage, trip duration, distance, and seasonality. The project uses several Bruin features to manage the workflow: - Asset definitions and dependencies to organize the raw, staging, and report layers. - Connection configuration for BigQuery through a named Bruin connection. - Materialization strategies including incremental `delete+insert` and full rebuild `create+replace`. - Built-in data quality checks such as `not_null` and `accepted_values` to improve reliability and consistency.

Hong Kong Transit Pulse — 香港交通脈搏

Muhammad Farizal Afkar

Hong Kong Transit Pulse is an end-to-end batch data engineering pipeline that ingests raw GTFS feeds and MTR open data, transforms them into analytics-ready models, and surfaces insights via an interactive Streamlit dashboard. The pipeline runs daily, pulling from two open data sources — HK Transport (GTFS) and MTR Corporation — loading them into Google Cloud Storage, transforming through BigQuery layers (raw → staging → marts), and visualising in a 4-tab dashboard with a real-time streaming layer on top.

Innovation Networks Pipelines

João Melga

A data engineering project that builds reproducible pipelines for analyzing venture capital syndication networks, applying the methodology from my research on nested investor syndication structures (Melga, 2025).

European Weather Analytics Pipeline

Alexey Kononets

European Weather Analytics Pipeline - an end-to-end batch pipeline that loads hourly weather for 95 European cities from 2010 onwards from the Open-Meteo Historical API into PostgreSQL 16, transforms it through a Kimball star schema (stage → core → mart), drives a live Power BI dashboard, and produces a 7-day Facebook Prophet forecast every night. Bruin features used: • Python assets - API ingestion with quota-aware resumable backfill, SCD1 dimension loaders, a gap-filling asset between stage/core/mart layers, and a Prophet 7-day forecast asset. • SQL assets - fact tables (fact_weather_hourly partitioned by year, fact_weather_daily) and the denormalised mart.daily_weather table for Power BI. • Mixed Python + SQL in a single DAG - dependencies declared via @bruin depends headers, so the whole chain (ingest → dimensions → facts → mart → backfill → forecast) runs in the correct order with one `bruin run /app`. • Pipeline scheduling - `schedule: "0 8 * * *"` in pipeline.yml, triggered daily by system cron via a small wrapper script. • Bruin connections - PostgreSQL connection defined in .bruin.yml and reused by every asset, keeping credentials out of the code. • `bruin validate` - used before every deployment to catch broken dependencies or misconfigured assets early.

TidingsIQ: Positive News Intelligence Pipeline

Shashank Srivastava

Data engineering project that builds a positive-news intelligence pipeline with GDELT, BigQuery, Bruin, Terraform, and Streamlit.

global-trade-intelligence-pipeline

Tumi Modiba

Built an end-to-end global trade intelligence pipeline focused on geopolitical risk, maritime chokepoints, and structural country vulnerability. The goal was simple: take messy, high-volume global trade and shipping data and turn it into something decision-makers can actually use. The project tracks how disruptions like Suez congestion, Black Sea instability, Panama drought pressure, and Brent oil volatility affect trade exposure, energy dependency, and national vulnerability across multiple countries.

Polymarket Pipeline

Jonathan Ramírez Quijada

Daily batch data pipeline that ingests ~15 GB of hourly Parquet files (~750 million rows) from a public archive, lands them in Google Cloud Storage, loads them into BigQuery and enriches each market with category metadata fetched from the Polymarket Gamma API. Bruin was used for the entire data lifecycle: ingestion, transformation, orchestration, data quality and analysis via the AI Data Analyst. The result powers a Looker Studio dashboard tracking platform liquidity, spread quality, and category activity over time.

MMA stats

Arnaud FRANCOIS

This project automates the ingestion, transformation, and analysis of UFC statistics to provide performance indicators and fight predictions based on historical data.

Buenos Aires Half Marathon (21K) Analytics

Sofia Florencia Rodriguez

This project analyzes how performance varies by age group and gender in the Buenos Aires Half Marathon, and showcases a full end-to-end data workflow — from ingestion and transformation to warehousing and interactive analysis. I also used Bruin for pipeline orchestration, scheduling, and data quality checks.

The Bus Factor

Joseph Wibowo

A Bruin-powered weekly data product that scores the 100 most-depended-on npm packages (ranked by distinct direct dependents) and the 100 most-downloaded PyPI packages (ranked by 90-day downloads) on importance × continuity fragility, publishes a static leaderboard, and lets you interrogate the dataset with AI.

Lassa-Watch

Joseph Akintola

LassaWatch is an end-to-end Lassa fever surveillance pipeline tracking outbreak data across 8 high-burden Nigerian states (Ondo, Edo, Bauchi, Taraba, Ebonyi, Plateau, Benue, Kogi) from 2021–2025. It ingests NCDC epidemiological data, Open-Meteo climate data, and healthcare infrastructure metrics, then transforms and joins them into analytical mart tables that reveal CFR trends, attack rates, and seasonal outbreak patterns. Bruin features used: Python assets for multi-source ingestion, DuckDB SQL assets for staging and mart layer transformations, asset dependencies (full DAG), data quality checks (not_null, positive, accepted_values), custom checks, table materialization, and Bruin MCP connected to Claude Code for AI-powered pipeline analysis.

B3 Options

Plínio Rodrigues de Oliveira Zanini

B3 Options Intelligence is an end-to-end data pipeline for Brazilian listed options monitoring, built with Bruin, Python, and DuckDB. B3 distributes daily and historical market data as compressed fixed-width files — one per session, with exchange-specific record layouts for equities, options, futures, and forwards mixed in the same file. There is no API. The starting point is a ZIP, the format is a 245-character-wide flat file, and the option classification requires cross-referencing market type codes with ticker suffix patterns to isolate standard equity options. This pipeline handles all of that and builds a monitoring layer on top. Pipeline: Python assets download, extract, and parse COTAHIST files using pandas.read_fwf with exchange-defined column specs Idempotent ingestion with deterministic row hashing across both daily and annual historical files Staging layer merges and deduplicates yearly and daily sources, classifies option type and underlying ticker, and serves as the single clean foundation for all marts 8 mart tables power the full monitoring workflow: unusual activity (volume and trade count Z-scores), liquidity surface by expiry and moneyness bucket, put/call flow regime, expiry concentration, dominant call/put walls per maturity, strike ladder, contract-level daily, and underlying-level daily with composite volume spike Streamlit dashboard for visual exploration with OHLC overlays from the underlying equity data GitHub Actions CI running bruin validate . on every push Numbers: backfilled from 2015, ~15K rows per daily session, 8 mart assets + 2 staging views, all checks passing. Why Bruin: ingestion is procedural Python, modeling is declarative SQL. Forcing either into the wrong layer would make the pipeline worse. Bruin handles both in a single DAG with dependency resolution, data quality checks, and column-level metadata — without a separate orchestrator.

Olist E-commerse

seyedehsara hashemi

This project is an end-to-end data engineering pipeline built to analyze Brazilian e-commerce trends using the Olist public dataset, covering approximately 100,000 orders placed between 2016 and 2018. The pipeline follows a medallion architecture ingesting raw CSV files into AWS S3, creating external Athena tables, and using Bruin to orchestrate staging transformations (type casting, null handling, data quality checks) and mart aggregations, all materialized as Iceberg tables. The final mart layer feeds a Looker Studio dashboard answering three core business questions: how daily revenue evolved over time, which Brazilian states drive the most e-commerce value, and which product categories are the top revenue generators. The goal of the project is to demonstrate how Bruin can replace a traditional multi-tool stack consolidating orchestration, transformation, and data quality into a single framework while delivering production-grade pipeline outputs backed by 24 automated quality checks across 15 assets.

climate decoupling pipeline

Muhammad Qasim

Analytics for a Greener Future — Built with Bruin

Lithium Lake

Barış Aslan

Lithium Lake is an end-to-end data pipeline that filters open-source material databases stage-by-stage using a Medallion architecture to discover the most promising solid-state EV battery candidate materials. This bridges the gap between software and physical science, helping researchers save time and energy by instantly filtering out non-viable options and surfacing rare candidates they might not have considered before. Bruin is used in orchestration of Bronze, Silver, and Gold layers of ingestion & transfer and data quality checks.

Data-Driven-Healthcare-Analytics

SOVANPANHA CHHEANG

This project builds an end-to-end data engineering pipeline for healthcare operations analytics. The goal is to unify batch and streaming data into a clean, queryable warehouse and deliver actionable insights through a dashboard.

EU Economic Monitor

Sharad Kumar Gupta

A production-grade data engineering pipeline that ingests European economic statistics from Eurostat, processes them using Bruin CLI, stores data in DuckDB, and visualizes everything in a Streamlit dashboard.

Suara-ID

Ana Nurkaromah

I built Suara-ID, a cloud-native pipeline that batch-ingests 10GB+ of Indonesian regional audio data, extracts metadata into BigQuery, and runs a Hugging Face faster-whisper AI model to automatically transcribe the speech to text. How I used Bruin: Instead of juggling Airflow and dbt, I used Bruin to orchestrate the entire thing! Python Assets: Handled the Kaggle API extraction, GCS Data Lake uploads, and the heavy AI/ML inference. SQL Assets: Handled the BigQuery staging, partitioning (by date), and clustering (by filename). Data Quality: Leveraged Bruin's native checks (Unique/Not Null) right on top of my SQL models. Having Python and SQL side-by-side in a single DAG made building this AI pipeline incredibly smooth.

Reddit Pulse — What Is the Internet Talking About?

Anisha Poojary

A fully automated data pipeline using Bruin that collects Reddit posts daily, cleans them, analyzes sentiment, tracks trending topics and viral posts, and visualizes everything in a Streamlit dashboard.

NairaBits Pipeline

Chukwuemeka Francisco Meneke

NairaBits is a production-grade data pipeline that puts Nigeria at the centre. It ingests live data from three public sources, transforms it through a clean layered architecture, and surfaces it in an interactive dashboard that tells the story numbers alone cannot. Bruin Features Used: Python Assets DuckDB SQL Assets Asset Dependencies (DAG orchestration) Materialization Quality Checks Column Descriptions Asset tagging

Kenya Civil Liberties Intelligence Observatory

Samwel Njogu Mwaniki

An open-source Bruin-powered intelligence platform that tracks internet censorship, government takedown requests, and political pressure signals in Kenya from 2023 to 2025. The pipeline integrates OONI (protocol-level blocking), ACLED (conflict events), Google Transparency, and Lumen-like data into a governed, layered architecture (Staging → Features → Intelligence → Reporting). It includes statistical guardrails, ASN behavioral profiling, temporal lag analysis, and interactive Streamlit dashboards. Built to be extensible across Africa.