Migration
10 min read

Migrating from Pentaho Data Integration to Bruin

A practical migration plan for moving Pentaho PDI and Kettle jobs to Bruin. The Bruin team can help with onboarding and migration planning for ingestr, SQL/Python assets, quality checks, DAC dashboards, MCP, and AI analytics.

Arsalan Noorafkan

Developer Advocate

Migrating from Pentaho is rarely a clean "old tool out, new tool in" project.

It is usually more awkward than that. You have PDI jobs that have been edited for years, Spoon transformations that nobody wants to touch, scheduled flows that write into reporting tables, a few custom scripts nearby, and at least one downstream dashboard that finance will notice if it breaks.

If you are evaluating alternatives to an older Pentaho estate, the Bruin team can help with onboarding and migration planning so the first pass focuses on inventory, mapping, parity checks, and a realistic cutover plan instead of a blank-page rewrite.

So the migration plan needs to be boring.

Not heroic. Not a six-month rewrite. Boring.

The goal is to move one flow at a time from Pentaho Data Integration or Kettle into Bruin, prove the output, add governance that probably did not exist before, and only then retire the old job.

Step 1: inventory the Pentaho estate

Start with a spreadsheet if you must. The format does not matter at first. The columns do.

Capture this for every job and transformation:

FieldWhy it matters
Job or transformation nameThe thing you are migrating
OwnerSomeone needs to approve parity
Source systemsDatabases, SaaS tools, files, APIs, FTP drops
Destination tables or filesWhat downstream users actually consume
ScheduleWhen the job runs and what it depends on
RuntimeHow long it takes today
Failure modeWhat usually breaks
Downstream reportsWhat will complain if the output changes
Business rulesLogic hidden inside steps, filters, joins, lookups
Data quality assumptionsRow counts, uniqueness, null checks, freshness

This sounds obvious, but it is where most migrations fail. Teams convert the easy transformations and miss the hidden business rule that was sitting in a filter step from 2018.

Step 2: pick one flow

Do not start with the biggest job.

Pick a pipeline that is important enough to matter and small enough to finish. A good first candidate has:

  • Two to five source tables or files
  • Clear downstream consumers
  • Known business logic
  • A measurable output
  • A friendly owner who can validate the result

Bad first candidates: the huge job that touches 60 tables, a finance close process nobody understands, or a job that writes into a report nobody owns.

You can get to those later. The first migration is about learning the pattern.

Step 3: map Pentaho concepts to Bruin concepts

Here is the mental model:

Pentaho conceptBruin concept
Transformation stepSQL or Python asset logic
Job dependencydepends relationship
Database inputingestr asset or SQL asset
File inputingestr, Python materialization, or warehouse external table
Lookup stepSQL join or Python enrichment
Filter rowsSQL WHERE clause or Python transform
Output tableAsset materialization
Job scheduleBruin Cloud schedule or CI/orchestrated run
Manual validationAsset quality checks
Operational notesMetadata, owners, tiers, documentation

The migration is not about recreating every visual step one-to-one. That is how you carry old complexity into the new system.

The better move is to recreate the business intent.

Step 4: rebuild ingestion first

Move source extraction before transformation logic.

In Bruin, common ingestion jobs use ingestr:

name: raw.postgres_orders
type: ingestr
parameters:
  source_connection: postgres
  source_table: public.orders
  destination: snowflake
  incremental_strategy: merge
  incremental_key: updated_at

columns:
  - name: id
    type: integer
    primary_key: true
  - name: updated_at
    type: timestamp

For custom sources, use Python materialization instead. This is where Bruin is useful for old enterprise systems that do not fit a neat connector catalogue.

"""@bruin
name: raw.partner_export
type: python
connection: snowflake
materialization:
  type: table
  strategy: replace
@bruin"""

import pandas as pd

def materialize(**kwargs):
    export_path = kwargs["secrets"]["partner_export_path"]
    return pd.read_csv(export_path)

That might replace a Pentaho file input, FTP step, custom shell wrapper, or a weird export process around the edge of the PDI job.

Step 5: move transformations into SQL or Python

Most Pentaho transformations become SQL. Joins, filters, aggregations, date logic, standardization, deduplication, and reporting tables are usually clearer in SQL than in a visual canvas.

/* @bruin
name: marts.daily_revenue
type: sf.sql
depends:
  - raw.postgres_orders
owner: finance-analytics
materialization:
  type: table
meta:
  tier: gold
  migrated_from: pentaho
columns:
  - name: revenue_date
    type: date
    checks:
      - name: not_null
  - name: order_count
    type: integer
    checks:
      - name: non_negative
  - name: gross_revenue
    type: float
    checks:
      - name: non_negative
@bruin
*/

SELECT
  DATE_TRUNC('day', created_at) AS revenue_date,
  SUM(amount) AS gross_revenue,
  COUNT(*) AS order_count
FROM raw.postgres_orders
WHERE status = 'completed'
GROUP BY 1

Use Python when the logic is actually Python-shaped: custom API calls, ML scoring, fuzzy matching, file parsing, complicated enrichment, or a proprietary library that already exists in your company.

The mistake is forcing everything into one language. Bruin lets SQL and Python depend on each other, so use the right tool for each part.

Step 6: add checks before comparing outputs

Do not wait until production to add quality checks.

Every migrated asset should have checks inside the asset definition:

columns:
  - name: id
    type: integer
    description: "Primary key"
    checks:
      - name: unique
      - name: not_null

This is the point of migrating to Bruin instead of just another ETL tool. The pipeline should say what healthy means.

Pentaho migration

Planning a Pentaho migration?

Tell us what your PDI jobs look like. The Bruin team can help separate the easy source moves, SQL rewrites, Python materializations, checks, MCP access, and DAC dashboards before cutover.

No direct production database access required. We can work from replicas, exports, or incremental loads.

Step 7: run Pentaho and Bruin in parallel

Parallel runs are non-negotiable for anything important.

Compare:

  • Row counts by table
  • Freshness timestamps
  • Null rates on key fields
  • Primary key uniqueness
  • Aggregated metrics by day, region, product, customer segment, or whatever matters
  • Downstream dashboard numbers
  • Runtime and failure frequency

You are looking for two kinds of differences.

First, migration bugs. Maybe a filter moved incorrectly. Maybe a lookup joined on the wrong key. Fix those.

Second, old bugs. This is awkward, but it happens. You may discover the Pentaho job was wrong and everyone got used to the wrong output. Do not hide that. Write it down, get the business owner to approve the corrected logic, and add a check so it does not come back.

Step 8: cut over one consumer at a time

Once Bruin matches or intentionally corrects the old output, cut over one downstream consumer.

Not all of them.

One report, one table, one team. Let it run. Watch it. Then expand.

The safest sequence is:

  1. Bruin writes to a parallel schema.
  2. Analysts validate the output.
  3. One dashboard or consumer switches to Bruin output.
  4. The old Pentaho output stays available for rollback.
  5. After an agreed window, Bruin becomes the source of truth.
  6. The Pentaho job is disabled, documented, and removed later.

Do the boring work. Future you will be grateful.

What Bruin Cloud adds after migration

Bruin CLI and ingestr handle the developer workflow. Bruin Cloud adds the enterprise layer:

  • Scheduling and orchestration
  • Run history and observability
  • Catalog and lineage
  • Asset tiers and meta-keys
  • SSO and RBAC
  • Audit logs
  • Cost visibility
  • DAC dashboards
  • MCP access for governed cloud operations from supported assistants
  • AI data analyst workflows in Slack, Microsoft Teams, browser, and other channels

This is the part many Pentaho migrations miss. They move ETL logic but do not improve governance. Then six months later they have the same operational mess in a newer tool.

The whole point is to leave with a better system.

A simple first-week plan

If you want to start this week:

  1. Pick one Pentaho job.
  2. Export its sources, destinations, schedule, and owner.
  3. Create a Bruin project.
  4. Rebuild ingestion with ingestr or Python.
  5. Recreate the transformation in SQL or Python.
  6. Add checks.
  7. Run side by side.
  8. Compare outputs with the business owner.

That is enough. Do not turn the first week into a platform strategy exercise.

Bottom line

Migrating from Pentaho to Bruin is not about making old jobs look modern. It is about making the data platform easier to understand, safer to change, and useful for AI-driven analysis.

Start with one flow. Preserve the business logic. Add checks. Prove parity. Then expand.

For the side-by-side comparison, read Pentaho vs Bruin. If you are still choosing the broader category, start with best data pipeline tools in 2026.