Build Staging + Report Layers

What you'll do

Ask the agent to add two new layers of SQL assets on top of the raw tables from Step 3:

staging.* views - one per raw source, cleaning and typing the data, computing basic derived columns
reports.* table - the analysis-ready joined surface that the AI will query in Step 6

Your raw tables are the source of truth - but they're messy: units in micros, timezones in UTC, fundamentals reported 30–60 days after fiscal quarter end, FRED series coming in at three different frequencies. Asking analytical questions directly against raw means re-deriving the same transforms every time.

The layered pattern fixes that once:

raw.* - untouched copy of what the source sent (Step 3 output)
staging.* - cleaned, typed, deduplicated. One view per source.
reports.* - business-ready joins with pre-computed metrics (FCF margin, TTM, quarterly return, yield curve regime, etc.)

Once this layer exists, you ask questions against reports.* and the AI doesn't have to re-derive FCF margin or re-align fundamentals-to-prices every session. Faster, cheaper, less room for mistakes.

1. Scaffold the staging views

Prompt the agent:

AI Prompt

In stock-analyst-101/assets/, create three Bruin SQL assets for the staging layer. All should materialize as views (cheap, always fresh) with schema staging.

staging/fundamentals.sql → staging.fundamentals

Source: raw.fmp_fundamentals
Uppercase symbol, cast date to DATE
Dedupe on (symbol, date) using QUALIFY ROW_NUMBER() OVER (PARTITION BY symbol, date ORDER BY date DESC) = 1
Pivot income-statement / balance-sheet / cash-flow rows into one row per (symbol, fiscal_quarter_end)
Compute free_cash_flow = operating_cash_flow - capital_expenditures
Keep: symbol, fiscal_quarter_end, revenue, gross_profit, operating_income, net_income, eps_diluted, operating_cash_flow, capital_expenditures, free_cash_flow, total_assets, total_liabilities, shareholders_equity

staging/prices.sql → staging.prices

Source: raw.yahoo_prices
Uppercase symbol, cast date to DATE
Dedupe on (symbol, date)
Compute daily_return = (close - LAG(close) OVER (PARTITION BY symbol ORDER BY date)) / LAG(close) OVER (...)
Compute 20-day and 50-day simple moving averages of close
Keep: symbol, date, open, high, low, close, adj_close, volume, daily_return, sma_20, sma_50

staging/macro.sql → staging.macro

Source: raw.fred_macro
Pivot so each date has columns: fed_funds_rate (FEDFUNDS), cpi (CPIAUCSL), unemployment_rate (UNRATE), yield_spread_10y2y (T10Y2Y), real_gdp (GDPC1)
Forward-fill missing values (monthly/quarterly series need daily granularity) using LAST_VALUE(... IGNORE NULLS) OVER (ORDER BY date) or equivalent
Add a regime column: CASE WHEN yield_spread_10y2y < 0 THEN 'inverted' ELSE 'normal' END

For each asset, add a top-level description:, column descriptions, and at least one quality check (e.g. not_null on symbol/date, unique on the grain columns).

Show me the SQL for each asset before writing the files. Don't run anything yet - I'll approve first.

Read each SQL file the agent produces. Look specifically for the QUALIFY, the dedup logic, and the pivot structure. If anything looks off, push back - it's cheaper to correct here than downstream.

2. Scaffold the reports table

Prompt:

AI Prompt

Now create one Bruin SQL asset for the reports layer, materialized as a table with schema reports.

reports/ticker_quarterly.sql → reports.ticker_quarterly

One row per (symbol, fiscal_quarter_end)
Join staging.fundamentals as the grain
Compute:
- ttm_revenue = sum of last 4 quarters of revenue (rolling window over symbol ordered by fiscal_quarter_end)
- ttm_fcf = same rolling 4-quarter sum of free_cash_flow
- ttm_fcf_margin = ttm_fcf / NULLIF(ttm_revenue, 0)
- revenue_yoy_growth = current quarter revenue / revenue 4 quarters ago - 1
- fcf_margin_qoq_change = (free_cash_flow / NULLIF(revenue, 0)) - (lag 1 quarter's fcf / revenue)
Join staging.prices using the next available trading day at or after the fiscal_quarter_end (fundamentals lag the quarter end - price "as of reporting" is approximated here)
- price_at_quarter_end, volume_at_quarter_end
- quarterly_return = price at this quarter end / price at previous quarter end - 1
Join staging.macro on the quarter_end date (backward-fill if that date has no macro row):
- fed_funds_rate_at_qend, yield_spread_at_qend, regime_at_qend

Add asset-level description, column descriptions, tags (tier:reports, domain:finance), and quality checks (not_null on symbol + fiscal_quarter_end, unique on the pair).

Show me the SQL before writing. I'll approve before you run anything.

This is the analysis-ready table. Every question in Step 6 will start here.

3. Run and spot-check

Prompt:

AI Prompt

Run bruin validate stock-analyst-101/ first. If it passes, run bruin run stock-analyst-101/ and stream the output. Tell me which assets succeeded and which failed.

After it completes, do spot checks across all three layers using bruin query --connection duckdb-default:

Row counts: raw.fmp_fundamentals vs staging.fundamentals vs reports.ticker_quarterly - any unexpected drops?
For one ticker (e.g. AAPL), show the last 4 rows of reports.ticker_quarterly - do ttm_revenue, ttm_fcf_margin, regime_at_qend look sensible?
Null audit: SELECT COUNT(*) FILTER (WHERE ttm_fcf_margin IS NULL) FROM reports.ticker_quarterly - some nulls expected for early quarters (can't compute TTM without 4 prior quarters); flag if too many.

Show me the results before moving on.

You should see ~4–8 quarters per ticker in the final table, depending on how much fundamentals history FMP returned.

4. Why the layering matters

Now you have a clean three-tier pipeline:

raw.*                      staging.*                   reports.*
──────────────────         ──────────────────          ──────────────────
fmp_fundamentals    →      fundamentals        ┐
yahoo_prices        →      prices              ├──→   ticker_quarterly
fred_macro          →      macro               ┘
(source truth)             (cleaned + typed)           (analysis-ready)

Every question you ask in Step 6 goes against reports.ticker_quarterly first. The AI agent doesn't have to re-derive TTM FCF margin or re-align fundamentals-to-prices every time. Raw and staging exist for validation and troubleshooting, not for everyday analysis.

In the next step, you'll enhance each layer with column-level descriptions and update your pipeline-scoped AGENTS.md so the agent knows to default to reports.*.

What just happened

You built a proper layered pipeline - raw → staging → reports - without writing any SQL by hand. The agent wrote every asset, you validated the logic, Bruin ran the whole pipeline. Your data is now cleaned, typed, deduped, joined, and enriched with the metrics an analyst actually wants. Next step: make sure the AI knows which layer to query when.

Build Staging + Report Layers

What you'll do

Why this step matters

1. Scaffold the staging views

2. Scaffold the reports table

3. Run and spot-check

4. Why the layering matters

What just happened

Learn more