Bruin Academy

Bruin + Python

Bruin is a first-class home for Python in your data stack - write Python assets, return DataFrames with materialization, and skip the boilerplate with the Python SDK.

How Bruin supports Python

Python is a first-class asset type in Bruin - not a sidecar, not an escape hatch. You can drop a .py file into your pipeline alongside SQL assets and Bruin treats it the same way: it runs in the right order, respects dependencies, surfaces checks, and integrates with lineage.

There are three layers that make this work.

1. Python assets

Any Python script becomes a Bruin asset by adding a @bruin comment block at the top:

"""@bruin
name: my_script
image: python:3.13
@bruin"""

print("Hello from Bruin!")

Each asset runs in an isolated environment with its own requirements.txt, so there are no cross-asset dependency conflicts. Bruin uses uv under the hood for fast, deterministic installs.

2. Materialization

By default, a Python script just runs. If you want the data it produces to land in a warehouse table, add a materialization block and define a materialize() function that returns a DataFrame:

"""@bruin
name: analytics.users
connection: my_bigquery
materialization:
  type: table
  strategy: merge
columns:
  - name: id
    type: integer
    primary_key: true
@bruin"""

import pandas as pd

def materialize():
    return pd.DataFrame({"id": [1, 2, 3], "name": ["Alice", "Bob", "Charlie"]})

Bruin serializes the return value to Apache Arrow and uses ingestr to load it with your chosen strategy (create+replace, append, delete+insert, or merge). No manual to_sql, no credential wiring. See Python materialization for the full walkthrough.

3. The Python SDK

The Bruin Python SDK (bruin-sdk on PyPI) eliminates the boilerplate most Python assets would otherwise need. Three imports cover the common cases:

  • query(sql) - run SQL against the asset's connection, get a pandas DataFrame back
  • context - typed access to pipeline metadata (start/end dates, full-refresh flag, variables)
  • get_connection(name) - the underlying database client when you need more control
from bruin import query, context

df = query(f"SELECT * FROM events WHERE dt >= '{context.start_date}'")

The SDK and materialization compose naturally: the SDK handles reading and transforming, materialization handles writing.


More modules and walkthroughs coming soon. In the meantime, start with the tutorials below.

Before you start