Bruin + Python
Bruin is a first-class home for Python in your data stack - write Python assets, return DataFrames with materialization, and skip the boilerplate with the Python SDK.
How Bruin supports Python
Python is a first-class asset type in Bruin - not a sidecar, not an escape hatch. You can drop a .py file into your pipeline alongside SQL assets and Bruin treats it the same way: it runs in the right order, respects dependencies, surfaces checks, and integrates with lineage.
There are three layers that make this work.
1. Python assets
Any Python script becomes a Bruin asset by adding a @bruin comment block at the top:
"""@bruin
name: my_script
image: python:3.13
@bruin"""
print("Hello from Bruin!")
Each asset runs in an isolated environment with its own requirements.txt, so there are no cross-asset dependency conflicts. Bruin uses uv under the hood for fast, deterministic installs.
2. Materialization
By default, a Python script just runs. If you want the data it produces to land in a warehouse table, add a materialization block and define a materialize() function that returns a DataFrame:
"""@bruin
name: analytics.users
connection: my_bigquery
materialization:
type: table
strategy: merge
columns:
- name: id
type: integer
primary_key: true
@bruin"""
import pandas as pd
def materialize():
return pd.DataFrame({"id": [1, 2, 3], "name": ["Alice", "Bob", "Charlie"]})
Bruin serializes the return value to Apache Arrow and uses ingestr to load it with your chosen strategy (create+replace, append, delete+insert, or merge). No manual to_sql, no credential wiring. See Python materialization for the full walkthrough.
3. The Python SDK
The Bruin Python SDK (bruin-sdk on PyPI) eliminates the boilerplate most Python assets would otherwise need. Three imports cover the common cases:
query(sql)- run SQL against the asset's connection, get a pandas DataFrame backcontext- typed access to pipeline metadata (start/end dates, full-refresh flag, variables)get_connection(name)- the underlying database client when you need more control
from bruin import query, context
df = query(f"SELECT * FROM events WHERE dt >= '{context.start_date}'")
The SDK and materialization compose naturally: the SDK handles reading and transforming, materialization handles writing.
More modules and walkthroughs coming soon. In the meantime, start with the tutorials below.
More Python tutorials
Hands-on guides for writing Python in Bruin.
Using the Bruin Python SDK
BeginnerSkip the boilerplate. Use the Bruin Python SDK to query databases, manage connections, and access pipeline context from your Python assets with a few imports.
Materializing Python Assets into Your Warehouse
BeginnerReturn a DataFrame, let Bruin handle the rest. Learn how to use Python materialization to load data into BigQuery, Snowflake, Postgres, and more - with support for merge, append, and incremental strategies.
Before you start
- Bruin CLI installed
- Python 3.10 or higher