Assets
Understand Bruin assets - the individual files that define your data models, ingestion scripts, transformations, and seed data.
Video
What is an asset?
An asset is a single file that represents a single data action - usually creating or updating a table or view in your database. Every asset has two parts:
- Definition (configuration) - the name, type, connection, materialization strategy, metadata, quality checks, and dependencies
- Content (code) - the actual SQL query, Python script, or YAML configuration
Asset types
Python assets
Used for ingestion or custom logic. The definition is at the top of the file, and the code below it. For materialized Python assets, you write a materialize() function that returns a DataFrame - Bruin handles creating the destination table and inserting the data.
SQL assets
Used for transformations. You write a SELECT query and Bruin wraps it with the appropriate materialization logic (e.g., INSERT INTO, CREATE TABLE) at runtime. You never write INSERT INTO yourself.
Seed assets (YAML)
Used for loading local files (like CSVs) into your database. You point the asset at a local file and Bruin creates the table from it. No code needed.
Naming
The asset name can be set explicitly or inferred from the file path. For example, a file at assets/raw/trips_raw.py would default to the name raw.trips_raw, which maps to the raw schema and trips_raw table in your database.
Dependencies and lineage
Assets declare which other assets they depend on. This creates the DAG that Bruin uses to determine execution order. When you run an asset with its downstream dependencies, Bruin follows this graph automatically.
You can view the full lineage in the VS Code extension.
Key points
- One asset = one file = one table or view
- SQL assets only contain a
SELECT- Bruin adds the materialization wrapper - Python assets return a DataFrame via
materialize()for built-in materialization - Dependencies between assets create the pipeline's execution order