Suara-ID
I built Suara-ID, a cloud-native pipeline that batch-ingests 10GB+ of Indonesian regional audio data, extracts metadata into BigQuery, and runs a Hugging Face faster-whisper AI model to automatically transcribe the speech to text. How I used Bruin: Instead of juggling Airflow and dbt, I used Bruin to orchestrate the entire thing! Python Assets: Handled the Kaggle API extraction, GCS Data Lake uploads, and the heavy AI/ML inference. SQL Assets: Handled the BigQuery staging, partitioning (by date), and clustering (by filename). Data Quality: Leveraged Bruin's native checks (Unique/Not Null) right on top of my SQL models. Having Python and SQL side-by-side in a single DAG made building this AI pipeline incredibly smooth.
Certificate of Completion
Saves as a high-resolution PNG image.