Bruin Academy

Oracle to DuckDB

Move data from Oracle XE into DuckDB using Bruin's oracle-duckdb template - includes Docker setup, data seeding, and a downstream SQL transformation.

What is this? A tutorial that walks you through using the oracle-duckdb template to move data from Oracle XE into DuckDB. You'll spin up a local Oracle instance with Docker, seed it with sample data, run the Bruin pipeline to transfer and transform the data, and verify the results.

What you'll learn: How to use Bruin templates, configure Oracle and DuckDB connections, set up a local Oracle XE instance with Docker, and run a pipeline that includes both data ingestion and SQL transformations.

What you'll build: A pipeline that ingests three Oracle tables (customers, orders, order_items), transfers them to DuckDB, then runs a SQL transformation to create an aggregated sales_per_customer report.

Start tutorial →


Full tutorial

Below is the complete tutorial you can read through, or use the step-by-step version above.

Initialize the project

Scaffold the project from the oracle-duckdb template:

bruin init oracle-duckdb oracle-demo

Configure connections

Edit .bruin.yml to set Oracle and DuckDB connection details:

environments:
  default:
    connections:
      oracle:
        - name: "oracle-default"
          host: "127.0.0.1"
          port: 1521
          user: "BRUIN_TMPL"
          password: "BruinRocks123"
          service_name: "XEPDB1"
      duckdb:
        - name: "duckdb-default"
          path: "./oracle_duckdb.db"

Launch Oracle XE with Docker

Start an Oracle XE container:

docker run -d --name oracle-xe \
  -p 1521:1521 \
  -e ORACLE_PASSWORD=SystemPass123 \
  container-registry.oracle.com/database/express:latest

Wait for the database to be ready (this can take a few minutes on first run).

Create the database user

Connect to the container and create the BRUIN_TMPL user:

docker exec -it oracle-xe sqlplus sys/SystemPass123@XEPDB1 as sysdba
CREATE USER BRUIN_TMPL IDENTIFIED BY "BruinRocks123";
GRANT CONNECT, RESOURCE, UNLIMITED TABLESPACE TO BRUIN_TMPL;

Seed the data

Create and populate three tables - customers, orders, and order_items:

-- Customers
CREATE TABLE BRUIN_TMPL.customers (
  customer_id NUMBER PRIMARY KEY,
  full_name   VARCHAR2(100),
  segment     VARCHAR2(50),
  email       VARCHAR2(150),
  updated_at  TIMESTAMP DEFAULT SYSTIMESTAMP
);

INSERT INTO BRUIN_TMPL.customers VALUES (1, 'Alice Müller', 'Enterprise', '[email protected]', SYSTIMESTAMP);
INSERT INTO BRUIN_TMPL.customers VALUES (2, 'Bob Santos', 'SMB', '[email protected]', SYSTIMESTAMP);
INSERT INTO BRUIN_TMPL.customers VALUES (3, 'Charlie Kim', 'Enterprise', '[email protected]', SYSTIMESTAMP);

-- Orders
CREATE TABLE BRUIN_TMPL.orders (
  order_id     NUMBER PRIMARY KEY,
  customer_id  NUMBER,
  order_status VARCHAR2(30),
  order_total  NUMBER(10,2),
  ordered_at   TIMESTAMP DEFAULT SYSTIMESTAMP
);

INSERT INTO BRUIN_TMPL.orders VALUES (101, 1, 'shipped', 250.00, SYSTIMESTAMP);
INSERT INTO BRUIN_TMPL.orders VALUES (102, 1, 'delivered', 125.50, SYSTIMESTAMP);
INSERT INTO BRUIN_TMPL.orders VALUES (103, 2, 'pending', 89.99, SYSTIMESTAMP);
INSERT INTO BRUIN_TMPL.orders VALUES (104, 3, 'shipped', 430.00, SYSTIMESTAMP);

-- Order Items
CREATE TABLE BRUIN_TMPL.order_items (
  order_item_id NUMBER PRIMARY KEY,
  order_id      NUMBER,
  product_sku   VARCHAR2(50),
  quantity      NUMBER,
  unit_price    NUMBER(10,2)
);

INSERT INTO BRUIN_TMPL.order_items VALUES (1001, 101, 'SKU-A1', 2, 75.00);
INSERT INTO BRUIN_TMPL.order_items VALUES (1002, 101, 'SKU-B2', 1, 100.00);
INSERT INTO BRUIN_TMPL.order_items VALUES (1003, 102, 'SKU-A1', 1, 75.00);
INSERT INTO BRUIN_TMPL.order_items VALUES (1004, 102, 'SKU-C3', 1, 50.50);
INSERT INTO BRUIN_TMPL.order_items VALUES (1005, 103, 'SKU-B2', 1, 89.99);
INSERT INTO BRUIN_TMPL.order_items VALUES (1006, 104, 'SKU-D4', 2, 215.00);
COMMIT;

Run the pipeline

bruin run . --config-file ./.bruin.yml

Bruin runs three parallel ingestr jobs to transfer the raw tables, followed by a DuckDB SQL transformation that creates the sales_per_customer report by joining all three sources.

Verify the results

duckdb oracle_duckdb.db "SELECT * FROM duckdb.sales_per_customer;"

You should see customer revenue, order counts, and last order dates.

Cleanup

When you're done, remove the Docker container and DuckDB file:

docker rm -f oracle-xe
rm oracle_duckdb.db

Before you start