Analyze Your Data

What you'll do

Create an AGENTS.md file with domain-specific context for your industry
Ask the agent real business questions and see it work

Your AI agent now has the tools - it can read your schema, it can query your warehouse. But tools alone aren't enough. A general-purpose AI doesn't know that "DAU" means Daily Active Users, that your timestamps are in UTC, or that joining two of your largest tables without a filter will timeout.

The AGENTS.md file is where you give the agent this domain knowledge. It's a plain markdown file at the root of your project that the AI reads before doing anything. Think of it as onboarding documentation - the same kind you'd write for a new analyst joining your team, except the reader is an AI.

Once AGENTS.md is in place, the agent stops guessing and starts reasoning with real context. That's the difference between "this query probably works" and "this query is correct."

How AGENTS.md works

The AGENTS.md file sits at the root of your Bruin project. AI coding tools (Cursor, Claude Code, Codex) automatically read it when they start a session in the project directory. Here's what to include:

Project overview - what the data is about, who uses it
Data access rules - tell the agent how to query data. Bruin CLI's bruin query command is the standard way for an agent to run SQL against your warehouse. You should instruct the agent to always show you the SQL before executing it, and to use --limit on large tables to avoid expensive queries. Also explicitly state it's read-only - no INSERT, UPDATE, DELETE, or DROP on production tables.
Domain glossary - acronyms, KPIs, business-specific definitions the agent wouldn't know on its own
Data caveats - timezone handling, known NULL semantics, sync delays, data freshness
Query guidelines - restrictions on expensive operations (e.g. avoid full table scans on large tables), preferred SQL patterns

Create your AGENTS.md

Create an AGENTS.md file at the root of your Bruin project - that's the directory where .bruin.yml lives (not the inner folder with pipeline.yml). This ensures your AI tool finds it when you open the project.

Start with the generic sections below - these apply regardless of your industry. Replace <connection-name> with the connection name you configured in Step 2. If you followed the tutorial defaults, this would be:

BigQuery: gcp-default
Redshift: redshift-default
ClickHouse: clickhouse-default
Postgres: postgres-default

# AGENTS.md

## Data access
- Use `bruin query --connection <connection-name> --query "<SQL>"` for all data access
- Always show the SQL query and explain your reasoning before executing it
- Use `--limit 10` when exploring unfamiliar tables or testing complex queries
- Read the `assets/` directory to understand available tables and their schemas before querying
- This is a **read-only** environment - never run INSERT, UPDATE, DELETE, or DROP statements

Add industry-specific context

Now add the sections that are specific to your domain - a project overview, glossary of terms, data caveats, and query guidelines. Pick the tab closest to your use case and append the content to the same AGENTS.md file.

Working with multiple domains? If your database contains data from different areas (e.g., e-commerce and stock market data), combine the relevant sections from multiple tabs. Include all glossary terms, data caveats, and query guidelines that apply to your data. The AI benefits from having comprehensive context even if not every term applies to every query.

Finance / Stock Market

Append the following to your AGENTS.md. Adjust the details to match your actual data and schema names.

## Project overview
This project contains financial and stock market data for investment analysis.
The data includes company financials, stock prices, market indicators, and
fundamental metrics.

## Domain glossary
- **FCF** - Free Cash Flow: operating cash flow minus capital expenditures
- **EPS** - Earnings Per Share: net income divided by outstanding shares
- **P/E** - Price-to-Earnings ratio: stock price divided by EPS
- **EBITDA** - Earnings Before Interest, Taxes, Depreciation, and Amortization
- **TTM** - Trailing Twelve Months: sum of the last 4 quarterly values
- **Market Cap** - share price multiplied by total outstanding shares
- **YoY** - Year over Year comparison
- **QoQ** - Quarter over Quarter comparison
- **Beta** - measure of stock volatility relative to the overall market
- **Dividend Yield** - annual dividends per share divided by stock price

## Data caveats
- All timestamps are in **UTC**
- Financial data has a **15-minute delay** from market close for EOD data
- Quarterly financial reports use the company's fiscal quarter, which may not align with calendar quarters
- `NULL` values in financial columns typically mean the metric is not applicable (e.g., EPS for pre-revenue companies), not that data is missing
- Stock splits affect historical price comparisons - always use split-adjusted prices when comparing across time periods

## Query guidelines
- When comparing financial metrics across time, prefer **TTM** over single-quarter values to smooth seasonality
- Always filter by date range - full table scans on price history tables are expensive
- For market cap weighted calculations, use the most recent `shares_outstanding` value
- Prefer `QUALIFY ROW_NUMBER()` over subqueries for finding latest records

Example prompts to try:

"Which companies had their free cash flow margin improve in the past 4 quarters but saw their stock price decrease more than 10% during the same period?"
"What's the average P/E ratio by sector, and which companies are trading more than 2 standard deviations below their sector average?"
"Show me the top 10 companies by TTM revenue growth that also have positive free cash flow."

Gaming / Mobile Apps

Append the following to your AGENTS.md. Adjust the details to match your actual data and schema names.

## Project overview
This project contains data from a gaming application, including player
activity, in-app purchases, retention metrics, and engagement events.

## Domain glossary
- **DAU** - Daily Active Users: unique users with at least one session in a calendar day
- **MAU** - Monthly Active Users: unique users with at least one session in a calendar month
- **ARPU** - Average Revenue Per User: total revenue divided by total users in a period
- **ARPPU** - Average Revenue Per Paying User: total revenue divided by paying users only
- **D1/D7/D30** - Day-1, Day-7, Day-30 retention: percentage of users who return N days after install
- **LTV** - Lifetime Value: predicted total revenue a user will generate
- **Session** - a continuous period of user activity; a new session starts after 30 minutes of inactivity
- **Whales** - users in the top 1% of spending
- **Churn** - users who haven't had a session in the past 14 days
- **Conversion rate** - percentage of users who made at least one purchase

## Data caveats
- All timestamps are in **UTC** - game events arrive from multiple time zones
- Session-level metrics vs. user-level metrics: don't mix them in the same aggregation without explicit grouping
- A "user" is identified by `device_id` by default; some tables also have `account_id` which maps to logged-in users (can be NULL for guest players)
- In-app purchase amounts are in **USD cents**, not dollars - divide by 100 for dollar values
- Retention is calculated based on **install date**, not registration date
- Event data has up to **2-hour ingestion delay** from the client SDK

## Query guidelines
- Always normalize DAU/MAU calculations by timezone when comparing across regions
- For retention curves, use **cohort-based analysis** grouped by install week
- Avoid `COUNT(DISTINCT user_id)` on raw event tables with more than 100M rows - use pre-aggregated daily tables when available
- When computing ARPU, decide upfront whether to include non-paying users (ARPU) or only payers (ARPPU)

Example prompts to try:

"What's our D7 retention by install week for the last 3 months? Is it trending up or down?"
"Who are our top 50 spenders in the last 30 days, and what's their average session length compared to non-paying users?"
"What's the conversion rate from free to paying user by acquisition channel?"

E-commerce

Append the following to your AGENTS.md. Adjust the details to match your actual data and schema names.

## Project overview
This project contains e-commerce transaction data, including orders, products,
customers, and inventory. It powers analytics for revenue, conversion,
and customer behavior.

## Domain glossary
- **AOV** - Average Order Value: total revenue divided by number of orders
- **GMV** - Gross Merchandise Value: total value of goods sold before returns and discounts
- **NMV** - Net Merchandise Value: GMV minus returns, cancellations, and discounts
- **Conversion Rate** - percentage of sessions that result in a completed purchase
- **Cart Abandonment** - percentage of users who add items to cart but don't complete checkout
- **LTV** - Lifetime Value: total revenue generated by a customer across all orders
- **CAC** - Customer Acquisition Cost: total marketing spend divided by new customers acquired
- **Repeat Purchase Rate** - percentage of customers with more than one order
- **SKU** - Stock Keeping Unit: unique identifier for a product variant
- **Basket Size** - number of items in a single order

## Data caveats
- All timestamps are in **UTC**; the business operates across US time zones
- `customer_id` is `NULL` for guest checkouts - these represent ~15-20% of orders
- Refund window is **30 days** - revenue metrics for recent orders may be revised downward
- Product prices include regional pricing; always use `order_total` for revenue, not `SUM(product_price)`
- Inventory data syncs every **4 hours** - real-time stock checks are not reliable
- Discount codes are stored as negative line items, not as a separate discount field

## Query guidelines
- For revenue reporting, always use `NMV` (net of returns) unless explicitly asked for GMV
- Exclude test orders: `WHERE order_source != 'internal_test'`
- When computing LTV, use a **12-month lookback window** by default
- For conversion rate, define the funnel clearly: session → product view → add to cart → checkout → payment confirmed
- Avoid joining `orders` with `events` directly - use the pre-built `order_attribution` table for marketing attribution

Example prompts to try:

"What's our AOV trend by month for the last 12 months? How does it differ between first-time and returning customers?"
"Which product categories have the highest cart abandonment rate, and what's the average cart value at abandonment?"
"Calculate LTV by acquisition channel for customers acquired in Q1 2025."

What just happened

You now have a working AI data analyst. The AGENTS.md file gives the agent the domain knowledge it needs to write accurate, context-aware queries. Combined with the schema metadata from Step 3 and the MCP connection from Step 4, the agent has everything it needs to start: it knows your tables, understands your business terms, and can query your warehouse directly.

Keep talking to it - ask follow-up questions, request visualizations, explore anomalies. The more you use it, the more you can refine the AGENTS.md - add new terms, tighten query guidelines, note edge cases you discover. This file is living documentation that gets better over time.

In the next step, we'll look at ways to push the agent's context even further - with a structured glossary and connections to your team's existing documentation in Notion or Confluence.

Tips for getting the most out of your AI analyst

Be specific with prompts - "What's the revenue trend?" is okay, but "What's the monthly NMV trend for the electronics category, excluding returns, for the past 12 months?" is much better
Ask to see the SQL first - Tell the agent to show you the query before running it, especially for complex analyses
Use --limit for exploration - When the agent is exploring unfamiliar tables, have it use --limit 10 to avoid expensive full-table scans
Iterate on AGENTS.md - When the agent gets something wrong, add a clarification to AGENTS.md so it doesn't repeat the mistake

Analyze Your Data

What you'll do

Why this step matters

How AGENTS.md works

Create your AGENTS.md

Add industry-specific context

Finance / Stock Market

Gaming / Mobile Apps

E-commerce

What just happened

Tips for getting the most out of your AI analyst

Learn more