
Agentic Salesforce to Snowflake ELT: From One Prompt to a Governed Pipeline
How Bruin CLI, Bruin MCP, Bruin Cloud, and agent skills can build and maintain a Salesforce to Snowflake ELT pipeline across bronze, silver, and gold layers.
A practical migration plan for moving Pentaho PDI and Kettle jobs to Bruin. The Bruin team can help with onboarding and migration planning for ingestr, SQL/Python assets, quality checks, DAC dashboards, MCP, and AI analytics.

Arsalan Noorafkan
Developer Advocate

Migrating from Pentaho is rarely a clean "old tool out, new tool in" project.
It is usually more awkward than that. You have PDI jobs that have been edited for years, Spoon transformations that nobody wants to touch, scheduled flows that write into reporting tables, a few custom scripts nearby, and at least one downstream dashboard that finance will notice if it breaks.
If you are evaluating alternatives to an older Pentaho estate, the Bruin team can help with onboarding and migration planning so the first pass focuses on inventory, mapping, parity checks, and a realistic cutover plan instead of a blank-page rewrite.
So the migration plan needs to be boring.
Not heroic. Not a six-month rewrite. Boring.
The goal is to move one flow at a time from Pentaho Data Integration or Kettle into Bruin, prove the output, add governance that probably did not exist before, and only then retire the old job.
Start with a spreadsheet if you must. The format does not matter at first. The columns do.
Capture this for every job and transformation:
| Field | Why it matters |
|---|---|
| Job or transformation name | The thing you are migrating |
| Owner | Someone needs to approve parity |
| Source systems | Databases, SaaS tools, files, APIs, FTP drops |
| Destination tables or files | What downstream users actually consume |
| Schedule | When the job runs and what it depends on |
| Runtime | How long it takes today |
| Failure mode | What usually breaks |
| Downstream reports | What will complain if the output changes |
| Business rules | Logic hidden inside steps, filters, joins, lookups |
| Data quality assumptions | Row counts, uniqueness, null checks, freshness |
This sounds obvious, but it is where most migrations fail. Teams convert the easy transformations and miss the hidden business rule that was sitting in a filter step from 2018.
Do not start with the biggest job.
Pick a pipeline that is important enough to matter and small enough to finish. A good first candidate has:
Bad first candidates: the huge job that touches 60 tables, a finance close process nobody understands, or a job that writes into a report nobody owns.
You can get to those later. The first migration is about learning the pattern.
Here is the mental model:
| Pentaho concept | Bruin concept |
|---|---|
| Transformation step | SQL or Python asset logic |
| Job dependency | depends relationship |
| Database input | ingestr asset or SQL asset |
| File input | ingestr, Python materialization, or warehouse external table |
| Lookup step | SQL join or Python enrichment |
| Filter rows | SQL WHERE clause or Python transform |
| Output table | Asset materialization |
| Job schedule | Bruin Cloud schedule or CI/orchestrated run |
| Manual validation | Asset quality checks |
| Operational notes | Metadata, owners, tiers, documentation |
The migration is not about recreating every visual step one-to-one. That is how you carry old complexity into the new system.
The better move is to recreate the business intent.
Move source extraction before transformation logic.
In Bruin, common ingestion jobs use ingestr:
name: raw.postgres_orders
type: ingestr
parameters:
source_connection: postgres
source_table: public.orders
destination: snowflake
incremental_strategy: merge
incremental_key: updated_at
columns:
- name: id
type: integer
primary_key: true
- name: updated_at
type: timestamp
For custom sources, use Python materialization instead. This is where Bruin is useful for old enterprise systems that do not fit a neat connector catalogue.
"""@bruin
name: raw.partner_export
type: python
connection: snowflake
materialization:
type: table
strategy: replace
@bruin"""
import pandas as pd
def materialize(**kwargs):
export_path = kwargs["secrets"]["partner_export_path"]
return pd.read_csv(export_path)
That might replace a Pentaho file input, FTP step, custom shell wrapper, or a weird export process around the edge of the PDI job.
Most Pentaho transformations become SQL. Joins, filters, aggregations, date logic, standardization, deduplication, and reporting tables are usually clearer in SQL than in a visual canvas.
/* @bruin
name: marts.daily_revenue
type: sf.sql
depends:
- raw.postgres_orders
owner: finance-analytics
materialization:
type: table
meta:
tier: gold
migrated_from: pentaho
columns:
- name: revenue_date
type: date
checks:
- name: not_null
- name: order_count
type: integer
checks:
- name: non_negative
- name: gross_revenue
type: float
checks:
- name: non_negative
@bruin
*/
SELECT
DATE_TRUNC('day', created_at) AS revenue_date,
SUM(amount) AS gross_revenue,
COUNT(*) AS order_count
FROM raw.postgres_orders
WHERE status = 'completed'
GROUP BY 1
Use Python when the logic is actually Python-shaped: custom API calls, ML scoring, fuzzy matching, file parsing, complicated enrichment, or a proprietary library that already exists in your company.
The mistake is forcing everything into one language. Bruin lets SQL and Python depend on each other, so use the right tool for each part.
Do not wait until production to add quality checks.
Every migrated asset should have checks inside the asset definition:
columns:
- name: id
type: integer
description: "Primary key"
checks:
- name: unique
- name: not_null
This is the point of migrating to Bruin instead of just another ETL tool. The pipeline should say what healthy means.
Pentaho migration
Tell us what your PDI jobs look like. The Bruin team can help separate the easy source moves, SQL rewrites, Python materializations, checks, MCP access, and DAC dashboards before cutover.
Parallel runs are non-negotiable for anything important.
Compare:
You are looking for two kinds of differences.
First, migration bugs. Maybe a filter moved incorrectly. Maybe a lookup joined on the wrong key. Fix those.
Second, old bugs. This is awkward, but it happens. You may discover the Pentaho job was wrong and everyone got used to the wrong output. Do not hide that. Write it down, get the business owner to approve the corrected logic, and add a check so it does not come back.
Once Bruin matches or intentionally corrects the old output, cut over one downstream consumer.
Not all of them.
One report, one table, one team. Let it run. Watch it. Then expand.
The safest sequence is:
Do the boring work. Future you will be grateful.
Bruin CLI and ingestr handle the developer workflow. Bruin Cloud adds the enterprise layer:
This is the part many Pentaho migrations miss. They move ETL logic but do not improve governance. Then six months later they have the same operational mess in a newer tool.
The whole point is to leave with a better system.
If you want to start this week:
That is enough. Do not turn the first week into a platform strategy exercise.
Migrating from Pentaho to Bruin is not about making old jobs look modern. It is about making the data platform easier to understand, safer to change, and useful for AI-driven analysis.
Start with one flow. Preserve the business logic. Add checks. Prove parity. Then expand.
For the side-by-side comparison, read Pentaho vs Bruin. If you are still choosing the broader category, start with best data pipeline tools in 2026.

How Bruin CLI, Bruin MCP, Bruin Cloud, and agent skills can build and maintain a Salesforce to Snowflake ELT pipeline across bronze, silver, and gold layers.

Most AI data analysts live in Slack or a browser. Bruin runs in WhatsApp too. Here is why field, sales, and ops teams prefer asking their data questions there, what it takes to make it actually work, and how to roll it out safely.
Can you just use ChatGPT, Claude, or a coding agent like Codex to analyze your company data? Here is the honest difference between a general AI model and a purpose-built AI data analyst, why a model alone is not enough, and what it takes to get trustworthy answers from live company data.