Step 4-6 min

AI-Enhance and Validate the Context

Use bruin ai enhance to fill every asset with descriptions, tags, and quality checks — then bruin validate to make sure nothing got corrupted along the way.

AI-Enhance and Validate the Context

What you'll do

  1. Run bruin ai enhance over context/assets/ so every asset gets a description, semantic tags, per-column docs, and quality checks
  2. Run bruin validate to confirm none of the YAMLs ended up malformed

Why this step matters

Without descriptions, an AI agent can read your schema but doesn't know what it means. It sees gmv and guesses; it sees status = 3 and queries blindly; it sees a created_at column and assumes UTC. Enhancement is what turns the structural skeleton from the previous step into something an agent can actually reason about.

Validation matters because ai enhance writes to YAML files at scale, and rare edge cases can produce malformed files. A 30-second bruin validate is cheap insurance that catches them immediately, before they confuse an agent at query time.

Instructions

1. Run the AI enhancement

From the dbt project root:

bruin ai enhance --claude context/assets

For each asset, Bruin sends the column list + a sample of the data to Claude and fills in:

  • A multi-paragraph description covering purpose, grain, lineage, and typical use
  • Semantic tags like domain:retail, layer:staging, sensitivity:pii
  • Per-column descriptions with business meaning
  • Quality checksnot_null on keys, unique on identifiers, accepted_values on enums

The command auto-detects which AI CLI you have installed. If you have several, pass an explicit flag — --claude, --opencode, --codex, or --cursor.

Time estimate. Each asset costs minutes of Claude time. For ~40 assets (the contoso reference), expect 30–60 minutes wall-clock. Bruin parallelizes up to 5 by default — increase with --concurrency 10 if you want it faster and your AI quota tolerates it.

Gotcha — ai enhance doesn't always honor --config-file. It can fall back to your global ~/.bruin.yml for connection lookup, and if that has a broken connection you'll see "fill columns failed" warnings. The warnings are cosmetic — column types were already filled by the import step. The enhancement still writes correctly.

Gotcha — rare YAML corruption. On a small fraction of assets, ai enhance has been known to mangle the columns: block. Always run bruin validate afterward (next step). If a single asset breaks, regenerate just that file: bruin ai enhance --claude context/assets/<schema>/<table>.asset.yml.

2. Spot-check a single asset

Open one of the report assets — these benefit most from enrichment because the column names alone don't tell the full story:

cat context/assets/contoso_dbt_reports/rpt_revenue_by_segment.asset.yml

You should now see something like:

name: contoso_dbt_reports.rpt_revenue_by_segment
type: bq.source
description: |
  Yearly revenue rolled up by product segment and category. Built from the
  staging order-line table joined with the product dimension. One row per
  (segment_id, category_name, year). Used by retail merchandising and
  finance for category-level reporting.
tags:
  - domain:retail
  - layer:reports
  - grain:segment_category_year
columns:
  - name: segment_id
    type: STRING
    description: "Identifier for the product segment (joins to dim_segment)."
    checks:
      - name: not_null
  - name: category_name
    type: STRING
    description: "Human-readable category label, e.g. 'Bikes', 'Components'."
    checks:
      - name: not_null
  - name: year
    type: INT64
    description: "Calendar year of the order date, in UTC."
  - name: revenue_usd
    type: NUMERIC
    description: "Sum of order_line.gross_amount in USD, post-discount."

This is what the agent will read before it queries. The richer this gets, the better its SQL gets.

Watch for incorrect unique checks. AI enhancement sometimes adds unique to columns that look like keys but aren't unique per row (e.g., segment_id in a yearly fact table appears once per year, not once total). Skim the generated checks and remove any that don't match how the data actually works.

3. Validate the whole pipeline

bruin validate --config-file context/.bruin.yml context

Expected output:

✓ Successfully validated 40 assets across 1 pipeline, all good.

If anything fails, the message will name the file and the line. Open it, fix or regenerate that single asset, and re-run validate until it's green.

4. Wrap it in a regenerator script (optional)

The whole sequence — config, import, filter, enhance, validate — is idempotent and worth wrapping in a script so you can refresh the context layer whenever your dbt models change. The reference project has generate_context.sh with --skip-import (re-enhance only) and --skip-enhance (fast structure refresh) flags. A minimal version:

#!/usr/bin/env bash
set -euo pipefail

CONFIG="context/.bruin.yml"
PIPELINE="context"

bruin import database \
  --config-file "$CONFIG" \
  --connection contoso_dbt_bq \
  --schemas contoso_dbt_raw \
  --schemas contoso_dbt_staging \
  --schemas contoso_dbt_reports \
  "$PIPELINE"

find "$PIPELINE/assets" -name "_dlt_*.asset.yml" -delete

bruin ai enhance --claude "$PIPELINE/assets"
bruin validate --config-file "$CONFIG" "$PIPELINE"

Save it as generate_context.sh next to your dbt project. Run it after meaningful schema changes — column renames, new models, dropped tables — to keep the context layer in sync.

Don't hand-edit generated YAMLs. They're regenerable artifacts. If a description is consistently wrong, fix it upstream — in the dbt model's schema.yml or description: block — and the next import + enhance will pick the change up.

What just happened

Your context/assets/ is now a 40-file knowledge base: every dbt-materialized table is documented with descriptions, tags, and checks that an AI agent can read before writing a single query. Combined with the warehouse connection from step 2, you have everything an agent needs except the wiring that lets it actually call out to all of this. That's the next step.