GitHub Activity Analytics Dashboard
GitHub generates millions of public events every day — pushes, pull requests, issues, forks, stars — across thousands of repositories and contributors worldwide. This raw activity stream is publicly available via gharchive.org, but it is not pre-aggregated or directly queryable in a useful analytical form. This project builds an end-to-end batch data pipeline that answers: - Which event types dominate GitHub activity on any given day or hour? - Which repositories attract the most contributors and drive the most events? - How does activity vary across the day (UTC), and what are the peak hours? - What is the daily mix of event types — is it push-heavy, or driven by issues and PRs? - Which programming language ecosystems (inferred from repo naming patterns) are most active?