Suara-ID

Ana Nurkaromah

I built Suara-ID, a cloud-native pipeline that batch-ingests 10GB+ of Indonesian regional audio data, extracts metadata into BigQuery, and runs a Hugging Face faster-whisper AI model to automatically transcribe the speech to text. How I used Bruin: Instead of juggling Airflow and dbt, I used Bruin to orchestrate the entire thing! Python Assets: Handled the Kaggle API extraction, GCS Data Lake uploads, and the heavy AI/ML inference. SQL Assets: Handled the BigQuery staging, partitioning (by date), and clustering (by filename). Data Quality: Leveraged Bruin's native checks (Unique/Not Null) right on top of my SQL models. Having Python and SQL side-by-side in a single DAG made building this AI pipeline incredibly smooth.

View on GitHub View Slack Thread

Certificate of Completion

Saves as a high-resolution PNG image.

Suara-ID

Certificate of Completion

Sign up to our newsletter