Start From an Existing dbt Pipeline

What you'll do

Confirm your dbt project is built and that the resulting tables exist in your warehouse. Everything in this module assumes those tables are already there - Bruin won't run dbt for you, it just describes what's in the warehouse afterward.

Why this step matters

The Bruin context layer is documentation-only for your dbt pipeline. It introspects the warehouse, not your dbt repo. So if your models haven't materialized yet, Bruin has nothing to import. Get dbt build green first, then come back.

If you don't already have a project to follow along with, the contoso-dbt reference repo is a complete worked example: dlt loads raw Contoso retail data into BigQuery, dbt builds 10 staging models and 7 report models on top, and the context/ directory holds the Bruin layer we'll build in the next steps.

If you already have a dbt project building cleanly into a warehouse, skim this page and skip ahead to Step 2: Create an Isolated Bruin Context.

Instructions

Reference project layout

The reference project ends up looking like this. The context/ directory is what this module produces - everything else is dbt prerequisites:

contoso-dbt/
├── ingest/pipeline.py         # dlt - loads raw tables to BigQuery
├── models/                    # dbt - staging + reports
│   ├── sources.yml
│   ├── staging/stg_*.sql      (10)
│   └── reports/rpt_*.sql      (7)
├── context/                   # ← the Bruin context layer (this module)
│   ├── .bruin.yml
│   ├── pipeline.yml
│   └── assets/
│       ├── contoso_dbt_raw/*.asset.yml      (23)
│       ├── contoso_dbt_staging/*.asset.yml  (10)
│       └── contoso_dbt_reports/*.asset.yml  (7)
├── AGENTS.md                  # how agents should use the context
└── run_pipeline.sh            # dlt ingest + dbt build

The end state is 40 YAMLs - one per materialized table - covering raw ingest, staging, and reports.

What the dbt half looks like

The contoso-dbt project uses a fairly standard dbt-on-BigQuery setup. If yours looks similar, you're ready:

A dbt_project.yml and a profiles.yml (the reference uses OAuth via gcloud auth application-default login, location EU)
A models/sources.yml declaring raw tables loaded by dlt into a *_raw schema
Staging models (stg_*.sql) materialized into a *_staging schema
Report / mart models (rpt_*.sql) materialized into a *_reports schema
A generate_schema_name.sql macro so a +schema: staging config lands as contoso_dbt_staging, not contoso_dbt_contoso_dbt_staging
dbt build runs cleanly and the tables exist in the warehouse

Heads up on schema naming. dbt's default generate_schema_name macro concatenates the target schema with the +schema config, which is rarely what you want. The reference project overrides it to use the +schema value directly. If your final schema names look doubled-up, this is usually why.

Verify the warehouse has data

Before moving to the Bruin steps, make sure your dbt-built tables actually exist. From your dbt project root:

dbt build

Then spot-check the warehouse. For BigQuery:

bq ls bruin-playground-arsalan:contoso_dbt_staging
bq ls bruin-playground-arsalan:contoso_dbt_reports

For Postgres / Redshift:

SELECT table_schema, table_name
FROM information_schema.tables
WHERE table_schema IN ('contoso_dbt_staging', 'contoso_dbt_reports')
ORDER BY 1, 2;

You should see one row per materialized model. If a model is missing, fix dbt before continuing - Bruin will simply skip what isn't there.

What just happened

You now have a working dbt project with materialized staging and report tables in your warehouse. That's the canonical input for the Bruin context layer: a set of real schemas Bruin can introspect. From here on out, no more dbt commands - everything is bruin.