Bruin Academy

Chess Data to DuckDB

Build your first Bruin pipeline by ingesting chess API data and storing it in DuckDB - no credentials needed.

What is this? A beginner-friendly tutorial where you build a complete data pipeline that pulls chess game data from a public API and loads it into DuckDB for analysis. No API keys or database credentials needed - the chess API is completely open and DuckDB runs locally.

What you'll learn: How to initialize a Bruin project from a template, configure environments and connections, understand asset types (ingestr and SQL), and run a pipeline end-to-end.

What you'll build: A pipeline that ingests chess games and player profiles for top grandmasters, then creates a summary table with player statistics including total games and win rates.

Start tutorial →


Full tutorial

Below is the complete tutorial you can read through, or use the step-by-step version above.

Initialize the project

Run the following command to scaffold a new project using the built-in chess template:

bruin init chess

This creates a folder structure with pre-configured assets and pipeline files:

chess/
├── assets/
│   ├── chess_games.asset.yml
│   ├── chess_profiles.asset.yml
│   └── player_summary.sql
├── .bruin.yml
├── pipeline.yml
└── .gitignore

Configure the environment

Open .bruin.yml and configure your environment with DuckDB and Chess API connections. Specify the list of chess players you want to track:

environments:
  default:
    connections:
      duckdb:
        - name: "duckdb-default"
          path: "chess.db"
      chess:
        - name: "chess-default"
          players:
            - "FabianoCaruana"
            - "Hikaru"
            - "MagnusCarlsen"
            - "GarryKasparov"
            - "Firouzja2003"

Review the assets

The template includes three pre-configured assets:

  • chess_games.asset.yml — An ingestr asset that fetches game data for each player from the Chess.com API.
  • chess_profiles.asset.yml — An ingestr asset that fetches player profile information.
  • player_summary.sql — A SQL asset that joins games and profiles to create a summary table with statistics like total games and win rates.

Examine the pipeline

The pipeline.yml file defines the pipeline name and default connections:

name: chess
default_connections:
  duckdb: "duckdb-default"
  chess: "chess-default"

Run the pipeline

Execute the pipeline to ingest data:

bruin run ./chess/pipeline.yml

Query the results

Once the pipeline completes, query the results to verify everything worked:

bruin query --c duckdb-default --q "SELECT * FROM chess_playground.player_summary LIMIT 10;"

You should see a table with player statistics including usernames, total games, wins, losses, and win rates.

Chess Data to DuckDB

Before you start