All integrations
Snowplow
+
Bruin

Snowplow + Bruin

Source

Ingest Snowplow data into your warehouse with incremental loading, quality checks, and full lineage. Defined in YAML, version-controlled in Git.

For business teams

What you get

  • Analysis beyond built-in reports

    Join Snowplow behavioral data with revenue, support, and CRM data. Answer questions Snowplow alone can't.

  • Trusted behavioral data

    Quality checks catch tracking gaps, duplicate events, and missing timestamps before they corrupt your models.

  • Self-serve for analysts

    Snowplow data lands in your warehouse where analysts already work. No more exporting, no more waiting.

  • Real user journeys

    Combine Snowplow events with purchase and support data to see the full customer journey, not just the product funnel.

For data & engineering teams

How it works

  • Event schema validation

    Check for null event IDs, missing timestamps, and duplicate events on every sync. Catch tracking issues at ingestion.

  • YAML-defined, Git-versioned

    Your Snowplow pipeline is a YAML file. Review in PRs, deploy with CI/CD, roll back with git revert.

  • SQL + Python transforms

    Transform raw Snowplow events into funnels, cohorts, and user journeys with SQL or Python — in the same pipeline.

  • Dependency-aware scheduling

    Bruin resolves pipeline dependencies automatically. Transforms only run after Snowplow data has landed.

Before you start

Snowplow BDP account with API access

Step 1

Add your Snowplow connection

Connect using Snowplow BDP Console API credentials. Add this to your Bruin environment file — credentials are stored securely and referenced by name in your pipeline YAML.

Parameters

  • api_keySnowplow BDP Console API key
  • organization_idYour Snowplow organization identifier
  • pipeline_idThe pipeline identifier to extract data from
connections:
  snowplow:
    type: snowplow
    uri: "snowplow://?api_key=<your-api-key>&organization_id=<your-org-id>&pipeline_id=<your-pipeline-id>"

Step 2

Create your pipeline

Define a YAML asset that tells Bruin what to pull from Snowplow and where to land it. This file lives in your Git repo — reviewable, version-controlled, and deployable with CI/CD.

Available tables

eventssessionsuserspage_viewscontexts
name: raw.snowplow_events
type: ingestr

parameters:
  source_connection: snowplow
  source_table: 'events'
  destination: bigquery

Step 3

Add quality checks

Add column-level and custom SQL checks to your Snowplow data. If a check fails, the pipeline stops — bad data never reaches downstream models or dashboards.

Catch duplicate events and missing timestamps
Validate event freshness — stale data gets flagged
Ensure event IDs are unique across syncs
columns:
  - name: event_id
    checks:
      - name: not_null
      - name: unique
  - name: event_timestamp
    checks:
      - name: not_null

custom_checks:
  - name: data is fresh
    query: |
      SELECT MAX(event_timestamp) >
        TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 24 HOUR)
      FROM raw.snowplow_events

Step 4

Run it

One command. Bruin connects to Snowplow, pulls data incrementally, runs your quality checks, and lands clean data in your warehouse. If a check fails, the pipeline stops — bad data never reaches downstream.

Backfill historical data with --start-date
Schedule with cron or trigger from CI/CD
Full lineage from Snowplow to your dashboards
$ bruin run .
Running pipeline...

  snowplow_events
    ✓ Fetched 2,847 new records
    ✓ Quality: campaign_id not_null     PASSED
    ✓ Quality: spend not_null           PASSED
    ✓ Quality: no negative ad spend     PASSED
    ✓ Loaded into bigquery

  Completed in 12s

Ready to connect Snowplow?

Start for free, or book a demo to see how Bruin handles ingestion, quality, lineage, and scheduling for your entire data stack.