Build Staging + Report Layers
Add a clean middle layer on top of the raw data, then build a final report table that joins fundamentals, prices, and macro - the analysis-ready table your AI agent will query.
What you'll do
Ask the agent to add two new layers of SQL assets on top of the raw tables from Step 3:
staging.*views - one per raw source, cleaning and typing the data, computing basic derived columnsreports.*table - the analysis-ready joined surface that the AI will query in Step 6
Why this step matters
Your raw tables are the source of truth - but they're messy: units in micros, timezones in UTC, fundamentals reported 30–60 days after fiscal quarter end, FRED series coming in at three different frequencies. Asking analytical questions directly against raw means re-deriving the same transforms every time.
The layered pattern fixes that once:
raw.*- untouched copy of what the source sent (Step 3 output)staging.*- cleaned, typed, deduplicated. One view per source.reports.*- business-ready joins with pre-computed metrics (FCF margin, TTM, quarterly return, yield curve regime, etc.)
Once this layer exists, you ask questions against reports.* and the AI doesn't have to re-derive FCF margin or re-align fundamentals-to-prices every session. Faster, cheaper, less room for mistakes.
1. Scaffold the staging views
Prompt the agent:
In stock-analyst-101/assets/, create three Bruin SQL assets for the staging layer. All should materialize as views (cheap, always fresh) with schema staging.
staging/fundamentals.sql → staging.fundamentals
- Source:
raw.fmp_fundamentals - Uppercase
symbol, castdateto DATE - Dedupe on (symbol, date) using
QUALIFY ROW_NUMBER() OVER (PARTITION BY symbol, date ORDER BY date DESC) = 1 - Pivot income-statement / balance-sheet / cash-flow rows into one row per (symbol, fiscal_quarter_end)
- Compute
free_cash_flow = operating_cash_flow - capital_expenditures - Keep: symbol, fiscal_quarter_end, revenue, gross_profit, operating_income, net_income, eps_diluted, operating_cash_flow, capital_expenditures, free_cash_flow, total_assets, total_liabilities, shareholders_equity
staging/prices.sql → staging.prices
- Source:
raw.yahoo_prices - Uppercase
symbol, castdateto DATE - Dedupe on (symbol, date)
- Compute
daily_return = (close - LAG(close) OVER (PARTITION BY symbol ORDER BY date)) / LAG(close) OVER (...) - Compute 20-day and 50-day simple moving averages of close
- Keep: symbol, date, open, high, low, close, adj_close, volume, daily_return, sma_20, sma_50
staging/macro.sql → staging.macro
- Source:
raw.fred_macro - Pivot so each date has columns:
fed_funds_rate(FEDFUNDS),cpi(CPIAUCSL),unemployment_rate(UNRATE),yield_spread_10y2y(T10Y2Y),real_gdp(GDPC1) - Forward-fill missing values (monthly/quarterly series need daily granularity) using
LAST_VALUE(... IGNORE NULLS) OVER (ORDER BY date)or equivalent - Add a
regimecolumn:CASE WHEN yield_spread_10y2y < 0 THEN 'inverted' ELSE 'normal' END
For each asset, add a top-level description:, column descriptions, and at least one quality check (e.g. not_null on symbol/date, unique on the grain columns).
Show me the SQL for each asset before writing the files. Don't run anything yet - I'll approve first.
Read each SQL file the agent produces. Look specifically for the QUALIFY, the dedup logic, and the pivot structure. If anything looks off, push back - it's cheaper to correct here than downstream.
2. Scaffold the reports table
Prompt:
Now create one Bruin SQL asset for the reports layer, materialized as a table with schema reports.
reports/ticker_quarterly.sql → reports.ticker_quarterly
- One row per (symbol, fiscal_quarter_end)
- Join
staging.fundamentalsas the grain - Compute:
ttm_revenue= sum of last 4 quarters of revenue (rolling window over symbol ordered by fiscal_quarter_end)ttm_fcf= same rolling 4-quarter sum of free_cash_flowttm_fcf_margin= ttm_fcf / NULLIF(ttm_revenue, 0)revenue_yoy_growth= current quarter revenue / revenue 4 quarters ago - 1fcf_margin_qoq_change= (free_cash_flow / NULLIF(revenue, 0)) - (lag 1 quarter's fcf / revenue)
- Join
staging.pricesusing the next available trading day at or after the fiscal_quarter_end (fundamentals lag the quarter end - price "as of reporting" is approximated here)price_at_quarter_end,volume_at_quarter_endquarterly_return= price at this quarter end / price at previous quarter end - 1
- Join
staging.macroon the quarter_end date (backward-fill if that date has no macro row):fed_funds_rate_at_qend,yield_spread_at_qend,regime_at_qend
Add asset-level description, column descriptions, tags (tier:reports, domain:finance), and quality checks (not_null on symbol + fiscal_quarter_end, unique on the pair).
Show me the SQL before writing. I'll approve before you run anything.
This is the analysis-ready table. Every question in Step 6 will start here.
3. Run and spot-check
Prompt:
Run bruin validate stock-analyst-101/ first. If it passes, run bruin run stock-analyst-101/ and stream the output. Tell me which assets succeeded and which failed.
After it completes, do spot checks across all three layers using bruin query --connection duckdb-default:
- Row counts:
raw.fmp_fundamentalsvsstaging.fundamentalsvsreports.ticker_quarterly- any unexpected drops? - For one ticker (e.g. AAPL), show the last 4 rows of
reports.ticker_quarterly- dottm_revenue,ttm_fcf_margin,regime_at_qendlook sensible? - Null audit:
SELECT COUNT(*) FILTER (WHERE ttm_fcf_margin IS NULL) FROM reports.ticker_quarterly- some nulls expected for early quarters (can't compute TTM without 4 prior quarters); flag if too many.
Show me the results before moving on.
You should see ~4–8 quarters per ticker in the final table, depending on how much fundamentals history FMP returned.
4. Why the layering matters
Now you have a clean three-tier pipeline:
raw.* staging.* reports.*
────────────────── ────────────────── ──────────────────
fmp_fundamentals → fundamentals ┐
yahoo_prices → prices ├──→ ticker_quarterly
fred_macro → macro ┘
(source truth) (cleaned + typed) (analysis-ready)
Every question you ask in Step 6 goes against reports.ticker_quarterly first. The AI agent doesn't have to re-derive TTM FCF margin or re-align fundamentals-to-prices every time. Raw and staging exist for validation and troubleshooting, not for everyday analysis.
In the next step, you'll enhance each layer with column-level descriptions and update your pipeline-scoped AGENTS.md so the agent knows to default to reports.*.
What just happened
You built a proper layered pipeline - raw → staging → reports - without writing any SQL by hand. The agent wrote every asset, you validated the logic, Bruin ran the whole pipeline. Your data is now cleaned, typed, deduped, joined, and enriched with the metrics an analyst actually wants. Next step: make sure the AI knows which layer to query when.