US Flights Data Engineering Project (2015)
:memo: Problem Description The aviation industry generates massive amounts of data daily. This project analyzes a dataset of 5.8 million flights in the US from 2015 to identify patterns in delays and cancellations. The goal is to provide actionable insights for operational management through a robust data pipeline and interactive dashboards. Source: <https://www.kaggle.com/datasets/usdot/flight-delays> Key Questions Addressed: Punctuality: Which airlines and airports are the most/least punctual (OTP)? Correlation: How do flight distance and time of day affect the probability of delay? Seasonality: What are the seasonal trends in flight reliability? :building_construction: Project Architecture Since the official DOT Bureau of Transportation Statistics does not provide a public API, the data is sourced from Kaggle. The project follows a modern ELT (Extract, Load, Transform) approach using the Medallion Architecture (Bronze, Silver, Gold layers), moving data from raw CSVs to structured analytical reports. :hammer_and_wrench: Technologies & Infrastructure Cloud: Storage (Google Cloud Storage, GCS), Data Warehouse (Google BigQuery) Infrastructure: Batch Processing Architecture Workflow Orchestration & Transformation: Bruin Language: Python (Ingestion), SQL (Transformations) Data Visualization: Power BI (Desktop & Service)