Clearbit + Bruin
Ingest Clearbit data into your warehouse with incremental loading, quality checks, and full lineage. Defined in YAML, version-controlled in Git.
For business teams
What you get
Sales analytics beyond the CRM
Join Clearbit pipeline data with marketing spend and product usage. Know which campaigns actually drive revenue.
Clean contact data
Quality checks deduplicate contacts, catch missing emails, and validate pipeline stages on every sync.
Revenue forecasting you trust
Feed clean Clearbit data into forecasting models. Bad CRM data makes bad forecasts — Bruin catches issues first.
Marketing attribution that works
Connect Clearbit closed-won deals back to ad spend and campaigns. Finance gets numbers they can actually trust.
For data & engineering teams
How it works
Deduplication built in
Bruin handles incremental loading with merge strategy. Contacts and deals are deduplicated automatically on every sync.
YAML-defined, Git-versioned
Your Clearbit pipeline is a YAML file. Review in PRs, deploy with CI/CD, roll back with git revert.
Custom SQL quality checks
Validate pipeline stage values, check for orphaned deals, and enforce referential integrity with custom SQL.
End-to-end lineage
Trace Clearbit data from ingestion through every transform to final dashboards. Know what breaks when schemas change.
Before you start
Step 1
Add your Clearbit connection
Connect using API key authentication. Add this to your Bruin environment file — credentials are stored securely and referenced by name in your pipeline YAML.
Parameters
api_keyAPI key from Clearbit dashboard
connections:
clearbit:
type: clearbit
uri: "clearbit://?api_key=<api-key>"Step 2
Create your pipeline
Define a YAML asset that tells Bruin what to pull from Clearbit and where to land it. This file lives in your Git repo — reviewable, version-controlled, and deployable with CI/CD.
Available tables
name: raw.clearbit_companies
type: ingestr
parameters:
source_connection: clearbit
source_table: 'companies'
destination: bigqueryStep 3
Add quality checks
Add column-level and custom SQL checks to your Clearbit data. If a check fails, the pipeline stops — bad data never reaches downstream models or dashboards.
columns:
- name: id
checks:
- name: not_null
- name: unique
- name: stage
checks:
- name: accepted_values
value: ['lead', 'qualified', 'proposal', 'negotiation', 'closed_won', 'closed_lost']
custom_checks:
- name: no orphaned deals
query: |
SELECT COUNT(*) = 0
FROM raw.clearbit_companies
WHERE contact_id IS NULLStep 4
Run it
One command. Bruin connects to Clearbit, pulls data incrementally, runs your quality checks, and lands clean data in your warehouse. If a check fails, the pipeline stops — bad data never reaches downstream.
--start-date$ bruin run .Running pipeline...
clearbit_companies
✓ Fetched 2,847 new records
✓ Quality: campaign_id not_null PASSED
✓ Quality: spend not_null PASSED
✓ Quality: no negative ad spend PASSED
✓ Loaded into bigquery
Completed in 12sReady to connect Clearbit?
Start for free, or book a demo to see how Bruin handles ingestion, quality, lineage, and scheduling for your entire data stack.


