BigQuery TVFs for AI Agents: A Lightweight Semantic Layer Pattern
BigQuery table-valued functions can give AI data agents a small, governed query interface without asking them to write full SQL against raw warehouse tables.
A semantic layer defines business metrics and dimensions in one place, then lets tools generate SQL from those definitions.
A BigQuery table-valued function, or TVF, is a SQL function that returns a table and accepts parameters. For AI agents, TVFs can work like a small, controlled semantic interface: the data team writes the approved query logic once, and the agent only provides predefined inputs like dates, country, platform, or customer segment.
This is useful when you want an AI data analyst to answer common business questions without asking it to write full SQL from scratch every time.
A semantic layer is where you define the business meaning of your data. Things like revenue, active_user, monthly_recurring_revenue, customer_segment, and so on.
Instead of every dashboard, analyst, or agent writing their own version of the SQL, the definition lives in one place. Tools can then use that definition to generate the query.
So if someone asks for revenue by month, the tool knows what revenue means, which table to query, what timestamp to group by, and which filters should apply.
This is useful for dashboards, reports, and self-serve analytics. It is also relevant for AI agents, because agents need context before they can query data reliably.
The classic text-to-SQL failure mode is pretty simple: the agent sees tables, columns, maybe a few descriptions, and then tries to infer business logic that usually lives in someone's head, a dashboard, or an old dbt model. That works until it doesn't.
A TVF is basically a SQL function that returns a table. BigQuery's docs describe table functions as behaving a bit like views, except they can take parameters. You can define something like this:
create or replace table function analytics.revenue_by_period(
p_start_date date,
p_end_date date,
p_country string
)
as (
select
date_trunc(order_date, month) as month,
sum(order_amount - refunded_amount) as revenue
from analytics.orders
where order_date between p_start_date and p_end_date
and (p_country is null or shipping_country = p_country)
and status = 'paid'
group by 1
);
Then the agent does not need to write the full query. It only needs to call the approved function:
select *
from analytics.revenue_by_period('2026-01-01', '2026-03-31', 'Canada');
The useful part is the constraint.
The agent gets a small menu of approved functions. Each function has predefined inputs. The query logic lives inside BigQuery, where it can be reviewed and maintained like any other SQL object.
For example, the agent does not need to understand the full orders schema, refund logic, order statuses, or date handling. It only needs to understand that revenue_by_period accepts a start date, end date, and optional country, then returns revenue grouped by month.
For some workflows, a TVF covers the most important part of a semantic layer: one reviewed place for business logic.
revenue means whatever the TVF defines it to mean. If the definition changes, you update the function, review the change, and future agent calls use the new logic.
A TVF is usually narrower than a semantic layer. A semantic layer often supports many combinations of metrics, dimensions, filters, and granularities. A TVF can focus on one question or one family of questions:
This can be useful when the consumer is an AI agent. Agents are generally better at choosing parameters from a constrained interface than writing fresh SQL against a warehouse with limited context.
In practice, the agent prompt or tool description can say:
Use analytics.revenue_by_period when the user asks for revenue over time.
Required inputs:
- p_start_date: start date for the analysis
- p_end_date: end date for the analysis
- p_country: optional country filter, or null for all countries
That gives the agent enough structure to call the function without exposing every table and join behind it.
TVFs will not cover every semantic-layer use case. If you need flexible dashboard exploration, metric catalogues, lineage, access workflows, descriptions, ownership, and lots of reusable dimensions, then a semantic layer gives you a broader system.
If your goal is to let an agent answer common business questions with a limited set of approved patterns, TVFs are a practical starting point.
You can expose a handful of reviewed functions, document the inputs, give the agent examples, and stop asking it to generate complex SQL from scratch every time someone asks a question in Slack.
The query surface stays smaller, the data team keeps control over the logic, and the business still gets answers inside the conversation where the question came up.
For many teams, this is a reasonable first step before implementing a full semantic layer, especially if the first priority is defining and protecting the first few important metrics.
This pattern works best when the TVFs are part of a proper analytics workflow, not a few disconnected SQL snippets sitting in the warehouse.
With Bruin, teams can keep ingestion, transformations, checks, and metadata in code. Bruin Cloud then adds the managed layer around that work: orchestration, catalog, lineage, audit trails, access controls, and observability.
That matters for AI agents because the agent is only as good as the context it can trust. A TVF can define the query interface, but the surrounding pipeline still needs freshness checks, ownership, lineage, and review.
Sometimes, for narrow workflows. TVFs work well when the questions are predictable and the inputs can be controlled. A semantic layer is still better for broad metric exploration, dashboard development, governance, and reusable dimensions.
They reduce how much SQL the agent has to write. The agent chooses parameters, the TVF handles the query logic, and the data team keeps control over the metric definition.
They overlap, but the workflow is different. With text-to-SQL, the agent often generates a full query from natural language. With TVFs, the agent maps the request to an approved function and fills in the inputs.
Document the function name, when to use it, required parameters, optional parameters, output columns, and a few example calls. The goal is to make the agent choose from a clear interface, not infer hidden business logic.
Curious how other teams are handling this: do you give agents access to raw tables, semantic definitions, approved query templates, TVFs, or something else?