Bruin Academy

Tutorial module

Chess Data to DuckDB

Build your first Bruin pipeline by ingesting chess API data and storing it in DuckDB - no credentials needed.

Build a complete data pipeline that pulls chess game data from a public API and loads it into DuckDB for analysis. No API keys or database credentials needed - the chess API is open and DuckDB runs locally.

What

  • Initialize a Bruin project from a built-in template
  • Configure connections for DuckDB and the Chess.com API
  • Run a pipeline that ingests games, profiles, and creates a player summary

How

  • bruin init chess scaffolds the project with pre-configured assets
  • Ingestr assets pull data from the API, SQL assets transform it
  • bruin run executes the full pipeline end-to-end

Initialize the project

Run the following command to scaffold a new project using the built-in chess template:

bruin init chess

This creates a folder structure with pre-configured assets and pipeline files:

bruin/
├── .bruin.yml
├── .gitignore
└── chess/
    ├── pipeline.yml
    ├── README.md
    └── assets/
        ├── chess_games.asset.yml
        ├── chess_profiles.asset.yml
        └── player_summary.sql

Configure the environment

Open .bruin.yml and configure your environment with DuckDB and Chess API connections. Specify the list of chess players you want to track:

environments:
  default:
    connections:
      duckdb:
        - name: "duckdb-default"
          path: "chess.db"
      chess:
        - name: "chess-default"
          players:
            - "FabianoCaruana"
            - "Hikaru"
            - "MagnusCarlsen"
            - "GarryKasparov"
            - "Firouzja2003"

Review the assets

The template includes three pre-configured assets:

  • chess_games.asset.yml - An ingestr asset that fetches game data for each player from the Chess.com API.
  • chess_profiles.asset.yml - An ingestr asset that fetches player profile information.
  • player_summary.sql - A SQL asset that joins games and profiles to create a summary table with statistics like total games and win rates.

Examine the pipeline

The pipeline.yml file defines the pipeline name and default connections:

name: chess
default_connections:
  duckdb: "duckdb-default"
  chess: "chess-default"

Run the pipeline

Execute the pipeline to ingest data:

bruin run ./chess/pipeline.yml

Query the results

Once the pipeline completes, query the results to verify everything worked:

bruin query --c duckdb-default --q "SELECT * FROM chess_playground.player_summary LIMIT 10;"

You should see a table with player statistics including usernames, total games, wins, losses, and win rates.

Running the full pipeline from terminal

Running the full pipeline from terminal