Indeed + Bruin

Source

Ingest Indeed data into your warehouse with incremental loading, quality checks, and full lineage. Defined in YAML, version-controlled in Git.

For business teams

What you get

People analytics beyond HR tools
Join Indeed data with finance and project data. See fully-loaded team cost, hiring ROI, and attrition trends.
Headcount planning with real data
Combine Indeed org data with budget and project data. Plan headcount based on actual numbers, not estimates.
Compliance-ready data
Quality checks validate that required fields are present, records are consistent, and org hierarchy is valid.
Faster reporting cycles
Indeed data syncs automatically. HR and finance get fresh data without waiting for someone to pull a report.

For data & engineering teams

How it works

Automatic schema handling
Bruin detects Indeed schema changes and handles them automatically. No manual migration scripts.
YAML-defined, Git-versioned
Your Indeed pipeline is a YAML file. Review in PRs, deploy with CI/CD, roll back with git revert.
Hierarchy validation
Custom SQL checks validate manager-employee relationships and catch orphaned records in Indeed org data.
Incremental sync
Only sync new and changed Indeed records. Full org structure stays in sync without re-processing everything.

Before you start

Indeed Developer account

OAuth credentials (client_id, client_secret)

Employer ID from Indeed

Step 1

Add your Indeed connection

Connect using Indeed OAuth credentials and employer ID. Add this to your Bruin environment file, credentials are stored securely and referenced by name in your pipeline YAML.

Parameters

client_idOAuth client ID for Indeed API authentication
client_secretOAuth client secret for Indeed API authentication
employer_idThe employer ID associated with your Indeed account

connections:
  indeed:
    type: indeed
    uri: "indeed://?client_id=<client_id>&client_secret=<client_secret>&employer_id=<employer_id>"

Step 2

Create your pipeline

Define a YAML asset that tells Bruin what to pull from Indeed and where to land it. This file lives in your Git repo, reviewable, version-controlled, and deployable with CI/CD.

Available tables

campaignscampaign_detailscampaign_budgetcampaign_jobscampaign_propertiescampaign_statsaccounttraffic_stats

name: raw.indeed_campaigns
type: ingestr

parameters:
  source_connection: indeed
  source_table: 'campaigns'
  destination: bigquery

Step 3

Add quality checks

Add column-level and custom SQL checks to your Indeed data. If a check fails, the pipeline stops, bad data never reaches downstream models or dashboards.

Validate manager-employee hierarchy is valid

Catch employees with null departments

Ensure employee IDs are unique across syncs

columns:
  - name: employee_id
    checks:
      - name: not_null
      - name: unique
  - name: status
    checks:
      - name: accepted_values
        value: ['active', 'inactive', 'terminated', 'on_leave']

custom_checks:
  - name: valid manager hierarchy
    query: |
      SELECT COUNT(*) = 0
      FROM raw.indeed_campaigns
      WHERE manager_id IS NOT NULL
        AND manager_id NOT IN (SELECT employee_id FROM raw.indeed_campaigns)

Step 4

Run it

One command. Bruin connects to Indeed, pulls data incrementally, runs your quality checks, and lands clean data in your warehouse. If a check fails, the pipeline stops, bad data never reaches downstream.

Backfill historical data with --start-date

Schedule with cron or trigger from CI/CD

Full lineage from Indeed to your dashboards

$ bruin run .

Running pipeline...

  indeed_campaigns
    ✓ Fetched 2,847 new records
    ✓ Quality: campaign_id not_null     PASSED
    ✓ Quality: spend not_null           PASSED
    ✓ Quality: no negative ad spend     PASSED
    ✓ Loaded into bigquery

  Completed in 12s