Oracle to DuckDB
Move data from Oracle XE into DuckDB using Bruin's oracle-duckdb template - includes Docker setup, data seeding, and a downstream SQL transformation.
What is this? A tutorial that walks you through using the oracle-duckdb template to move data from Oracle XE into DuckDB. You'll spin up a local Oracle instance with Docker, seed it with sample data, run the Bruin pipeline to transfer and transform the data, and verify the results.
What you'll learn: How to use Bruin templates, configure Oracle and DuckDB connections, set up a local Oracle XE instance with Docker, and run a pipeline that includes both data ingestion and SQL transformations.
What you'll build: A pipeline that ingests three Oracle tables (customers, orders, order_items), transfers them to DuckDB, then runs a SQL transformation to create an aggregated sales_per_customer report.
Full tutorial
Below is the complete tutorial you can read through, or use the step-by-step version above.
Initialize the project
Scaffold the project from the oracle-duckdb template:
bruin init oracle-duckdb oracle-demo
Configure connections
Edit .bruin.yml to set Oracle and DuckDB connection details:
environments:
default:
connections:
oracle:
- name: "oracle-default"
host: "127.0.0.1"
port: 1521
user: "BRUIN_TMPL"
password: "BruinRocks123"
service_name: "XEPDB1"
duckdb:
- name: "duckdb-default"
path: "./oracle_duckdb.db"
Launch Oracle XE with Docker
Start an Oracle XE container:
docker run -d --name oracle-xe \
-p 1521:1521 \
-e ORACLE_PASSWORD=SystemPass123 \
container-registry.oracle.com/database/express:latest
Wait for the database to be ready (this can take a few minutes on first run).
Create the database user
Connect to the container and create the BRUIN_TMPL user:
docker exec -it oracle-xe sqlplus sys/SystemPass123@XEPDB1 as sysdba
CREATE USER BRUIN_TMPL IDENTIFIED BY "BruinRocks123";
GRANT CONNECT, RESOURCE, UNLIMITED TABLESPACE TO BRUIN_TMPL;
Seed the data
Create and populate three tables - customers, orders, and order_items:
-- Customers
CREATE TABLE BRUIN_TMPL.customers (
customer_id NUMBER PRIMARY KEY,
full_name VARCHAR2(100),
segment VARCHAR2(50),
email VARCHAR2(150),
updated_at TIMESTAMP DEFAULT SYSTIMESTAMP
);
INSERT INTO BRUIN_TMPL.customers VALUES (1, 'Alice Müller', 'Enterprise', '[email protected]', SYSTIMESTAMP);
INSERT INTO BRUIN_TMPL.customers VALUES (2, 'Bob Santos', 'SMB', '[email protected]', SYSTIMESTAMP);
INSERT INTO BRUIN_TMPL.customers VALUES (3, 'Charlie Kim', 'Enterprise', '[email protected]', SYSTIMESTAMP);
-- Orders
CREATE TABLE BRUIN_TMPL.orders (
order_id NUMBER PRIMARY KEY,
customer_id NUMBER,
order_status VARCHAR2(30),
order_total NUMBER(10,2),
ordered_at TIMESTAMP DEFAULT SYSTIMESTAMP
);
INSERT INTO BRUIN_TMPL.orders VALUES (101, 1, 'shipped', 250.00, SYSTIMESTAMP);
INSERT INTO BRUIN_TMPL.orders VALUES (102, 1, 'delivered', 125.50, SYSTIMESTAMP);
INSERT INTO BRUIN_TMPL.orders VALUES (103, 2, 'pending', 89.99, SYSTIMESTAMP);
INSERT INTO BRUIN_TMPL.orders VALUES (104, 3, 'shipped', 430.00, SYSTIMESTAMP);
-- Order Items
CREATE TABLE BRUIN_TMPL.order_items (
order_item_id NUMBER PRIMARY KEY,
order_id NUMBER,
product_sku VARCHAR2(50),
quantity NUMBER,
unit_price NUMBER(10,2)
);
INSERT INTO BRUIN_TMPL.order_items VALUES (1001, 101, 'SKU-A1', 2, 75.00);
INSERT INTO BRUIN_TMPL.order_items VALUES (1002, 101, 'SKU-B2', 1, 100.00);
INSERT INTO BRUIN_TMPL.order_items VALUES (1003, 102, 'SKU-A1', 1, 75.00);
INSERT INTO BRUIN_TMPL.order_items VALUES (1004, 102, 'SKU-C3', 1, 50.50);
INSERT INTO BRUIN_TMPL.order_items VALUES (1005, 103, 'SKU-B2', 1, 89.99);
INSERT INTO BRUIN_TMPL.order_items VALUES (1006, 104, 'SKU-D4', 2, 215.00);
COMMIT;
Run the pipeline
bruin run . --config-file ./.bruin.yml
Bruin runs three parallel ingestr jobs to transfer the raw tables, followed by a DuckDB SQL transformation that creates the sales_per_customer report by joining all three sources.
Verify the results
duckdb oracle_duckdb.db "SELECT * FROM duckdb.sales_per_customer;"
You should see customer revenue, order counts, and last order dates.
Cleanup
When you're done, remove the Docker container and DuckDB file:
docker rm -f oracle-xe
rm oracle_duckdb.db
Before you start
- Bruin CLI installed
- Docker Desktop installed and running
- DuckDB CLI (optional, for querying results)