Skip to content

Macros

Macros are reusable pieces of Jinja code that allow you to define SQL patterns once and use them throughout your pipeline. Think of them as functions for your SQL queries—they help you write cleaner, more maintainable code by eliminating repetition.

Why Use Macros?

Macros are useful when you find yourself:

  • Writing the same SQL patterns repeatedly across multiple assets
  • Building complex queries that could be simplified with reusable components
  • Maintaining consistency in how certain operations are performed
  • Wanting to abstract away complexity from your main query logic

Getting Started

Directory Structure

To use macros in your pipeline, create a macros/ folder at the root of your pipeline directory:

text
my-pipeline/
├── pipeline.yml
├── macros/
│   ├── aggregations.sql
│   ├── filters.sql
│   └── transformations.sql
└── assets/
    ├── customers.sql
    └── orders.sql

File Naming

Macro files must end with .sql. You can organize your macros into multiple files based on their purpose (e.g., aggregations.sql, filters.sql, transformations.sql), or keep them all in a single file.

Automatic Loading

All macros defined in the macros/ folder are automatically loaded and available in all your assets. You don't need to import them explicitly—just define them once and use them anywhere.

Defining Macros

Basic Syntax

A macro is defined using the {% macro %} tag:

jinja
{% macro macro_name(parameter1, parameter2) -%}
    SQL code here
{%- endmacro %}

The {%- and -%} syntax strips whitespace, keeping your rendered output clean.

Simple Macro Without Parameters

Here's a basic macro that selects all columns from a table:

jinja
{% macro select_all(table_name) -%}
SELECT * FROM {{ table_name }}
{%- endmacro %}

Usage in an asset:

bruin-sql
/* @bruin
name: my_asset
type: duckdb.sql
@bruin */

{{ select_all('customers') }}

Macro With Multiple Parameters

jinja
{% macro date_range_filter(table, date_column, start_date, end_date) -%}
SELECT *
FROM {{ table }}
WHERE {{ date_column }} >= '{{ start_date }}'
  AND {{ date_column }} < '{{ end_date }}'
{%- endmacro %}

Macro With Default Parameters

jinja
{% macro top_n(table, column, n=10) -%}
SELECT *
FROM {{ table }}
ORDER BY {{ column }} DESC
LIMIT {{ n }}
{%- endmacro %}

Usage:

bruin-sql
-- Uses default n=10
{{ top_n('sales', 'revenue') }}

-- Override with n=5
{{ top_n('sales', 'revenue', 5) }}

Using Macros in Assets

Once defined in your macros/ folder, macros can be used in any SQL asset by calling them with the syntax.

Simple Usage

Example

macros/aggregations.sql

jinja
{% macro count_by(table, column) -%}
SELECT
    {{ column }},
    COUNT(*) as count
FROM {{ table }}
GROUP BY {{ column }}
ORDER BY count DESC
{%- endmacro %}

assets/country_summary.sql

bruin-sql
/* @bruin
name: country_summary
type: duckdb.sql

materialization:
  type: table
@bruin */

{{ count_by('customers', 'country') }}

This renders to:

sql
SELECT
    country,
    COUNT(*) as count
FROM customers
GROUP BY country
ORDER BY count DESC

Using Macros with Bruin Variables

Macros work seamlessly with Bruin's built-in variables like start_date and end_date:

bruin-sql
/* @bruin
name: daily_orders
type: duckdb.sql
@bruin */

{{ date_range_filter('orders', 'order_date', start_date, end_date) }}

Combining Macros with Jinja Features

You can use macros together with other Jinja features like loops and conditionals:

bruin-sql
/* @bruin
name: multi_country_report
type: duckdb.sql
@bruin */

{% for country in ['USA', 'UK', 'Canada'] %}
{{ count_by('orders', 'product_id') }}
WHERE country = '{{ country }}'
{% if not loop.last %}UNION ALL{% endif %}
{% endfor %}

Built-in Bruin SQL Helpers

Bruin also provides built-in SQL helpers under the bruin namespace. These are available in SQL assets without creating files in macros/, and many of them render platform-specific SQL for the asset type.

For example, the same helper can render different SQL in BigQuery, Snowflake, Postgres, DuckDB, SQL Server, and other supported platforms:

bruin-sql
select
    {{ bruin.generate_surrogate_key(['customer_id', 'order_id']) }} as order_key,
    {{ bruin.safe_divide('revenue', 'sessions') }} as revenue_per_session
from orders
{{ bruin.group_by(1) }}

bruin.group_by(n)

Generates a positional GROUP BY clause from 1 through n.

bruin-sql
select customer_id, order_date, count(*) as orders
from orders
{{ bruin.group_by(2) }}

Renders to:

sql
group by 1, 2

bruin.safe_divide(numerator, denominator)

Divides two expressions and returns null instead of failing on division by zero.

bruin-sql
{{ bruin.safe_divide('revenue', 'sessions') }}

bruin.safe_add(fields) and bruin.safe_subtract(fields)

Adds or subtracts expressions while treating null values as 0.

bruin-sql
{{ bruin.safe_add(['revenue', 'tax', 'shipping']) }}
{{ bruin.safe_subtract(['gross_revenue', 'discounts', 'refunds']) }}

bruin.generate_surrogate_key(fields)

Generates a stable MD5-based surrogate key expression from one or more fields. The rendered SQL uses the right cast, concatenation, and hash syntax for the asset platform where Bruin has a platform override.

bruin-sql
{{ bruin.generate_surrogate_key(['customer_id', 'order_id']) }}

bruin.pivot(column, values, ...)

Generates aggregate case expressions for pivoting known values into columns.

bruin-sql
select
    customer_id,
    {{ bruin.pivot('status', ['paid', 'refunded'], prefix='orders_') }}
from orders
group by customer_id

Optional keyword arguments:

  • alias: add aliases to generated expressions, defaults to true
  • agg: aggregate function, defaults to sum
  • cmp: comparison operator, defaults to =
  • prefix: alias prefix, defaults to empty
  • suffix: alias suffix, defaults to empty
  • then_value: value used when the comparison matches, defaults to 1
  • else_value: value used when the comparison does not match, defaults to 0
  • quote_identifiers: quote generated aliases, defaults to true
  • distinct: add distinct inside the aggregate, defaults to false

bruin.haversine_distance(lat1, lon1, lat2, lon2, unit='mi')

Generates a haversine distance expression. unit can be mi or km.

bruin-sql
{{ bruin.haversine_distance('pickup_lat', 'pickup_lon', 'dropoff_lat', 'dropoff_lon', unit='km') }}

bruin.degrees_to_radians(degrees)

Converts a degree expression to radians.

bruin-sql
{{ bruin.degrees_to_radians('angle_degrees') }}

bruin.width_bucket(expr, min_value, max_value, num_buckets)

Generates a bucket number expression for a numeric value. Platforms with native width_bucket support use the native function where applicable.

bruin-sql
{{ bruin.width_bucket('price', '0', '100', '10') }}

bruin.deduplicate(relation, partition_by, order_by)

Generates a query that keeps the first row per partition according to the given ordering.

bruin-sql
{{ bruin.deduplicate('events', 'user_id, account_id', 'updated_at desc') }}

Bruin uses platform-specific patterns such as qualify, distinct on, ordered array aggregation, or subqueries depending on the asset type.

bruin.generate_series(upper_bound)

Generates a single-column relation named generated_number with values from 1 through upper_bound.

bruin-sql
{{ bruin.generate_series(30) }}

bruin.date_spine(datepart, start_date, end_date)

Generates a single-column date or timestamp spine. The output column is named date_<datepart>.

bruin-sql
{{ bruin.date_spine('day', "'2024-01-01'", "'2024-02-01'") }}

Supported date parts include day, week, month, quarter, year, hour, minute, second, millisecond, and microsecond. Exact support depends on the target database.

bruin.slugify(value)

Converts text into a lowercase SQL-friendly identifier.

bruin-sql
select 1 as {{ bruin.slugify('9 Active Users!') }}

Renders to:

sql
select 1 as _9_active_users

URL Helpers

These helpers generate SQL expressions for parsing URL strings:

bruin-sql
{{ bruin.get_url_host('page_url') }}
{{ bruin.get_url_path('page_url') }}
{{ bruin.get_url_parameter('page_url', 'utm_source') }}
  • bruin.get_url_host(field) extracts the host
  • bruin.get_url_path(field) extracts the path without query parameters
  • bruin.get_url_parameter(field, parameter) extracts one query parameter value

Practical Examples

Aggregation Patterns

Count By Column

jinja
{% macro count_by(table, column, order_by='count') -%}
SELECT
    {{ column }},
    COUNT(*) as count
FROM {{ table }}
GROUP BY {{ column }}
ORDER BY {{ order_by }} DESC
{%- endmacro %}

Usage:

bruin-sql
{{ count_by('orders', 'customer_id') }}

Sum By Column

jinja
{% macro sum_by(table, group_column, sum_column) -%}
SELECT
    {{ group_column }},
    SUM({{ sum_column }}) as total
FROM {{ table }}
GROUP BY {{ group_column }}
ORDER BY total DESC
{%- endmacro %}

Usage:

bruin-sql
{{ sum_by('orders', 'customer_id', 'order_amount') }}

Filtering Patterns

Recent Records

jinja
{% macro recent_records(table, date_column, days=7) -%}
SELECT *
FROM {{ table }}
WHERE {{ date_column }} >= CURRENT_DATE - INTERVAL '{{ days }} days'
{%- endmacro %}

Usage:

bruin-sql
-- Get orders from last 7 days (default)
{{ recent_records('orders', 'order_date') }}

-- Get orders from last 30 days
{{ recent_records('orders', 'order_date', 30) }}

Filter Null Values

jinja
{% macro filter_null(table, columns) -%}
SELECT *
FROM {{ table }}
WHERE {% for col in columns %}
    {{- col }} IS NOT NULL
    {%- if not loop.last %} AND {% endif %}
{%- endfor %}
{%- endmacro %}

Usage:

bruin-sql
{{ filter_null('customers', ['email', 'phone', 'address']) }}

Renders to:

sql
SELECT *
FROM customers
WHERE email IS NOT NULL
  AND phone IS NOT NULL
  AND address IS NOT NULL

Filter by List of Values

jinja
{% macro in_list(table, column, values) -%}
SELECT *
FROM {{ table }}
WHERE {{ column }} IN (
    {%- for val in values %}
        '{{ val }}'
        {%- if not loop.last %}, {% endif %}
    {%- endfor %}
)
{%- endmacro %}

Usage:

bruin-sql
{{ in_list('orders', 'status', ['pending', 'processing', 'shipped']) }}

Transformation Patterns

Deduplicate Records

jinja
{% macro deduplicate(table, partition_column, order_column) -%}
SELECT * FROM (
    SELECT *,
           ROW_NUMBER() OVER (
               PARTITION BY {{ partition_column }}
               ORDER BY {{ order_column }} DESC
           ) as rn
    FROM {{ table }}
)
WHERE rn = 1
{%- endmacro %}

Usage:

bruin-sql
-- Keep most recent record for each customer
{{ deduplicate('customer_events', 'customer_id', 'event_timestamp') }}

Generate Surrogate Key

jinja
{% macro generate_surrogate_key(columns) -%}
MD5(CONCAT_WS('||',
    {%- for col in columns %}
        CAST({{ col }} AS VARCHAR)
        {%- if not loop.last %}, {% endif %}
    {%- endfor %}
)) as surrogate_key
{%- endmacro %}

Usage:

bruin-sql
SELECT
    {{ generate_surrogate_key(['customer_id', 'order_id']) }},
    *
FROM orders

Safe Division

jinja
{% macro safe_divide(numerator, denominator, default=0) -%}
CASE
    WHEN {{ denominator }} = 0 OR {{ denominator }} IS NULL
    THEN {{ default }}
    ELSE {{ numerator }}::DOUBLE / {{ denominator }}::DOUBLE
END
{%- endmacro %}

Usage:

bruin-sql
SELECT
    product_id,
    revenue,
    units_sold,
    {{ safe_divide('revenue', 'units_sold') }} as avg_price
FROM product_sales

Advanced Usage

Macro Composition

Macros can call other macros, allowing you to build more complex functionality:

Example

macros/base.sql

jinja
{% macro date_range_filter(table, date_column, start_date, end_date) -%}
SELECT *
FROM {{ table }}
WHERE {{ date_column }} >= '{{ start_date }}'
  AND {{ date_column }} < '{{ end_date }}'
{%- endmacro %}

macros/aggregations.sql

jinja
{% macro daily_summary(table, date_column, start_date, end_date) -%}
WITH filtered AS (
    {{ date_range_filter(table, date_column, start_date, end_date) }}
)
SELECT
    {{ date_column }}::DATE as day,
    COUNT(*) as total_count,
    COUNT(DISTINCT customer_id) as unique_customers
FROM filtered
GROUP BY {{ date_column }}::DATE
ORDER BY day
{%- endmacro %}

assets/daily_orders.sql

bruin-sql
/* @bruin
name: daily_orders
type: duckdb.sql
@bruin */

{{ daily_summary('orders', 'created_at', start_date, end_date) }}

Using Macros with Pipeline Variables

Combine macros with pipeline variables for powerful parameterization:

Example

pipeline.yml

yaml
name: sales_pipeline
variables:
  regions:
    type: array
    minItems: 1
    items:
      type: string
      enum: ['US', 'EU', 'APAC']
    default: ['US', 'EU']
  min_revenue:
    type: integer
    minimum: 1000
    maximum: 100000
    default: 5000

macros/filters.sql

jinja
{% macro revenue_filter(table, min_amount) -%}
SELECT *
FROM {{ table }}
WHERE revenue >= {{ min_amount }}
{%- endmacro %}

assets/regional_sales.sql

bruin-sql
/* @bruin
name: regional_sales
type: duckdb.sql
@bruin */

{% for region in var.regions %}
{{ revenue_filter('sales', var.min_revenue) }}
AND region = '{{ region }}'
{% if not loop.last %}UNION ALL{% endif %}
{% endfor %}

TIP

The pipeline snippet above showcases array enums and numeric bounds. For additional JSON Schema keywords you can mix into macros-driven workflows—including nested objects and nullable values—refer to the Variables keyword reference.

Dynamic Column Generation

Use macros to generate columns dynamically:

jinja
{% macro revenue_by_days(days_list) -%}
{% for day in days_list %}
SUM(CASE WHEN days_since_start <= {{ day }} THEN revenue ELSE 0 END) AS revenue_{{ day }}_days
{%- if not loop.last %},{% endif %}
{% endfor %}
{%- endmacro %}

Usage:

bruin-sql
SELECT
    cohort_date,
    {{ revenue_by_days([7, 14, 30, 90]) }}
FROM user_cohorts
GROUP BY cohort_date

Complete End-to-End Example

Here's a full example showing how to structure and use macros in a pipeline:

Complete Example

text
sales-pipeline/
├── pipeline.yml
├── macros/
│   ├── aggregations.sql
│   └── filters.sql
└── assets/
    ├── daily_sales.sql
    └── top_customers.sql

pipeline.yml

yaml
name: sales_pipeline
default_connections:
  duckdb: "duckdb-default"

macros/aggregations.sql

jinja
{% macro count_by(table, column) -%}
SELECT
    {{ column }},
    COUNT(*) as count
FROM {{ table }}
GROUP BY {{ column }}
ORDER BY count DESC
{%- endmacro %}

{% macro sum_by(table, group_column, sum_column) -%}
SELECT
    {{ group_column }},
    SUM({{ sum_column }}) as total
FROM {{ table }}
GROUP BY {{ group_column }}
ORDER BY total DESC
{%- endmacro %}

macros/filters.sql

jinja
{% macro date_range(table, date_column, start_date, end_date) -%}
SELECT *
FROM {{ table }}
WHERE {{ date_column }} >= '{{ start_date }}'
  AND {{ date_column }} < '{{ end_date }}'
{%- endmacro %}

assets/daily_sales.sql

bruin-sql
/* @bruin
name: daily_sales
type: duckdb.sql

materialization:
  type: table
@bruin */

WITH filtered_orders AS (
    {{ date_range('raw_orders', 'order_date', start_date, end_date) }}
)
{{ sum_by('filtered_orders', 'order_date::DATE', 'order_amount') }}

assets/top_customers.sql

bruin-sql
/* @bruin
name: top_customers
type: duckdb.sql

materialization:
  type: table

depends:
  - daily_sales
@bruin */

{{ sum_by('raw_orders', 'customer_id', 'order_amount') }}
LIMIT 100

To see the rendered output:

bash
bruin render assets/daily_sales.sql

Best Practices

Organization

  • Group related macros: Keep aggregations, filters, and transformations in separate files
  • Use descriptive names: calculate_revenue_by_region is better than calc1
  • Document complex macros: Add comments explaining parameters and behavior
jinja
{#
  Deduplicates records keeping the most recent one per partition.

  Args:
    table: The source table name
    partition_column: Column to partition by (e.g., customer_id)
    order_column: Column to order by (e.g., updated_at)
#}
{% macro deduplicate(table, partition_column, order_column) -%}
...
{%- endmacro %}

Commenting Macros

Important

SQL comments (--) don't prevent Jinja from processing macros. To show example macro code without executing it, use {% raw %} blocks:

sql
-- ❌ Wrong: This will still execute the macro!
-- {{ top_n('orders', 'amount', 10) }}

-- ✅ Correct: Wrap in {% raw %} to show the code without executing
{% raw %}
-- {{ top_n('orders', 'amount', 10) }}
{% endraw %}

Everything inside {% raw %} ... {% endraw %} is treated as literal text, allowing you to document macro examples without executing them.

Design

  • Keep macros focused: Each macro should do one thing well
  • Use default parameters: Make macros easier to use with sensible defaults
  • Make macros reusable: Avoid hardcoding table names or column names

Testing

  • Test with bruin render: Always preview the rendered output
bash
bruin render assets/my_asset.sql
  • Start simple: Build complex macros incrementally
  • Verify rendered SQL: Ensure the output is valid SQL for your platform

Performance

  • Avoid over-abstraction: Don't create macros for everything
  • Consider inline code: Simple patterns might not need a macro
  • Test performance: Some complex macros might impact query performance

Debugging Tips

Preview Macro Output

Use bruin render to see exactly what your macros generate:

bash
# Render a specific asset
bruin render assets/my_asset.sql

# See the full SQL that will be executed
bruin render assets/my_asset.sql --output json

Check Macro Loading

If your macros aren't working:

  1. Verify the macros/ folder is at the pipeline root
  2. Ensure macro files end with .sql
  3. Check for syntax errors in macro definitions
  4. Verify macro names match what you're calling in assets

Common Issues

Extra whitespace in output: Use {%- and -%} to control whitespace

jinja
{# Good: strips whitespace #}
{% macro my_macro() -%}
SELECT 1
{%- endmacro %}

{# Bad: includes whitespace #}
{% macro my_macro() %}
SELECT 1
{% endmacro %}

Macro not found: Ensure the macro is defined before it's used (macro files are loaded in alphabetical order)

When to Use Macros

Use macros when:

  • You repeat the same SQL pattern in multiple assets
  • You want to standardize certain operations across your pipeline
  • You need to abstract away complexity
  • You want to make your queries more readable

Don't use macros when:

  • The pattern is only used once
  • The macro would be more complex than the inline code
  • You need very dynamic behavior better suited to Python
  • Performance is critical and the macro adds overhead