Back to Showcase

Data Engineering Zoomcamp

Adilbek Bulatov

Hi everyone! :wave: I’ve just finished my *Course Project* for the Data Engineering Zoomcamp and wanted to share the results with you. *It took me almost 11 hours to complete the Course Project*, but it was a great experience to see the entire architecture come together! For this project, I analyzed *5.8 million flight records* from 2015 to identify operational bottlenecks and seasonal delay patterns in US aviation. :building_construction: *The Architecture (Medallion Principle):* • *Bronze (Raw):* Ingested 5.8M records from local CSVs to *Google Cloud Storage (GCS)* using Python. Established *External Tables* in BigQuery for efficient, direct data access. • *Silver (Staging):* Leveraged *Bruin* for transformation and orchestration. Implemented *Partitioning (by Month)* and *Clustering (by Airline)* in BigQuery for performance and cost-optimization. • *Gold (Analytics):* Developed a robust `fct_flights` table and specialized data marts to track On-Time Performance (OTP) and "snowball effect" delays. :bar_chart: *Serving & Insights:* I built two interactive *Power BI* dashboards to visualize the results: 1. *Executive Overview:* High-level KPIs and geospatial distribution of delays. 2. *Operational Deep Dive:* Analyzing how delays peak between 6 PM and 9 PM as schedule deviations accumulate throughout the day. :hammer_and_wrench: *Tech Stack:* `Python` | `GCP (GCS, BigQuery)` | `Bruin` | `Power BI` | `SQL` • Cloud: Storage (Google Cloud Storage, GCS), Data Warehouse (Google BigQuery) • Infrastructure: Batch Processing Architecture • Workflow Orchestration & Transformation: Bruin • Language: Python (Ingestion), SQL (Transformations) • Data Visualization: Power BI (Desktop & Service) *Check out the full project repository here:* :link:

Share: