Back to Showcase

Suara-ID

Ana Nurkaromah

I built Suara-ID, a cloud-native pipeline that batch-ingests 10GB+ of Indonesian regional audio data, extracts metadata into BigQuery, and runs a Hugging Face faster-whisper AI model to automatically transcribe the speech to text. How I used Bruin: Instead of juggling Airflow and dbt, I used Bruin to orchestrate the entire thing! Python Assets: Handled the Kaggle API extraction, GCS Data Lake uploads, and the heavy AI/ML inference. SQL Assets: Handled the BigQuery staging, partitioning (by date), and clustering (by filename). Data Quality: Leveraged Bruin's native checks (Unique/Not Null) right on top of my SQL models. Having Python and SQL side-by-side in a single DAG made building this AI pipeline incredibly smooth.

Certificate of Completion

BRUIN ACADEMYVerified Certificate of CompletionTHIS IS TO CERTIFY THATAna NurkaromahHAS SUCCESSFULLY COMPLETED THE PROJECTEnd-to-End Data Pipeline with BruinProject: Suara-ID · Issued May 2026VERIFIEDBRUIN ACADEMYSabri KaragonenSabri KaragonenCO-FOUNDER, BRUINArsalan NoorafkanArsalan NoorafkanDEVELOPER ADVOCATE, BRUINThis certificate has been verified. Confirm authenticity athttps://getbruin.com/project-showcase/suara-id/

Saves as a high-resolution PNG image.

Share: