Step 2
Beginner
10 min

Install Bruin & Create Your First Pipeline

Install Bruin CLI, set up the VS Code extension, initialize a project with DuckDB, and run your first assets to understand the basics.

Bruin CLI
Learning paths:Data Engineer

Video

Steps

1) Install Bruin CLI

If you haven't already, install Bruin using the recommended curl command from the installation docs. Verify with:

bruin version

2) Install the VS Code / Cursor extension

Search for "Bruin" in the VS Code or Cursor extension marketplace and install it. This gives you the Bruin render panel, run controls, and lineage visualization.

3) Set up the Bruin MCP (optional)

If you want to use an AI agent alongside the tutorial, add the Bruin MCP to your IDE. For Cursor, go to Settings > Tools & MCP > Add New MCP and paste the Bruin MCP configuration from the MCP docs.

4) Initialize a project

Run bruin init and choose the default template. This creates a git-initialized project with:

  • .bruin.yml - with a DuckDB connection and a chess.com sample connection
  • .gitignore - with .bruin.yml listed
  • A sample pipeline with Python, YAML (ingestor), and SQL assets

5) Understand the .bruin.yml

The generated config defines your environments and connections. The DuckDB connection creates a local database file - no cloud setup needed.

6) Run the sample assets

Open each asset and click Run in the Bruin panel:

  1. Python asset - a simple "Hello World" script
  2. Ingestor asset - pulls data from chess.com using Bruin's built-in connector, with configurable start/end date intervals for incremental ingestion
  3. SQL asset - transforms the ingested data with a basic aggregation, depending on the ingestor asset

7) Observe dependencies

The SQL asset depends on the ingestor asset. When you run the full pipeline, Bruin runs them in the correct order based on the dependency graph.

Key concepts covered

  • Environments and connections are defined in .bruin.yml and scoped per pipeline
  • Ingestor assets use built-in connectors for dozens of data sources
  • Start/end date intervals control incremental data ingestion
  • Dependencies between assets create the execution order
  • Materialization determines how results are stored (table = create from scratch)