HTTP + Bruin

Source

Ingest HTTP data into your warehouse with incremental loading, quality checks, and full lineage. Defined in YAML, version-controlled in Git.

For business teams

What you get

API data, on schedule
HTTP data lands in your warehouse automatically. No scripts to maintain, no pagination to handle.
Only fetch what changed
Incremental sync means no re-processing. Bruin tracks watermarks so you only get new and updated records.
Catch API changes early
Quality checks validate response data on every sync. Schema changes or missing fields get caught before they break models.
Transform in the same pipeline
Reshape HTTP API data with SQL or Python. Compute metrics, normalize schemas, and build models, all version-controlled.

For data & engineering teams

How it works

Managed pagination & retries
Bruin handles HTTP API pagination, rate limiting, and retries. You define the source, Bruin does the rest.
YAML-defined, Git-versioned
Your HTTP pipeline is a YAML file. Review in PRs, deploy with CI/CD, roll back with git revert.
Incremental with watermarks
Bruin tracks cursor positions and watermarks. Only new and updated HTTP records get fetched on each run.
Schema validation on responses
Quality checks validate HTTP API response structure on every sync. Catch breaking API changes early.

Before you start

API endpoint URL

Authentication credentials (if required)

Step 1

Add your HTTP connection

Connect to any HTTP/REST API endpoint. Add this to your Bruin environment file, credentials are stored securely and referenced by name in your pipeline YAML.

Parameters

urlThe HTTP(S) endpoint URL
headersCustom headers as JSON (optional)
auth_typeAuthentication type (bearer, basic, etc.)

connections:
  http:
    type: http
    uri: "http://?url=<api_url>&headers=<headers>&auth_type=<auth_type>"

Step 2

Create your pipeline

Define a YAML asset that tells Bruin what to pull from HTTP and where to land it. This file lives in your Git repo, reviewable, version-controlled, and deployable with CI/CD.

name: raw.http_data
type: ingestr

parameters:
  source_connection: http
  source_table: 'data'
  destination: bigquery

Step 3

Add quality checks

Add column-level and custom SQL checks to your HTTP data. If a check fails, the pipeline stops, bad data never reaches downstream models or dashboards.

Validate API data freshness on every sync

Ensure record IDs are unique across fetches

Catch missing fields from API response changes

columns:
  - name: id
    checks:
      - name: not_null
      - name: unique
  - name: fetched_at
    checks:
      - name: not_null

custom_checks:
  - name: API data is fresh
    query: |
      SELECT MAX(fetched_at) >
        TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 24 HOUR)
      FROM raw.http_data

Step 4

Run it

One command. Bruin connects to HTTP, pulls data incrementally, runs your quality checks, and lands clean data in your warehouse. If a check fails, the pipeline stops, bad data never reaches downstream.

Backfill historical data with --start-date

Schedule with cron or trigger from CI/CD

Full lineage from HTTP to your dashboards

$ bruin run .

Running pipeline...

  http_data
    ✓ Fetched 2,847 new records
    ✓ Quality: campaign_id not_null     PASSED
    ✓ Quality: spend not_null           PASSED
    ✓ Quality: no negative ad spend     PASSED
    ✓ Loaded into bigquery

  Completed in 12s

Other API integrations

Allium

Anthropic

Frankfurter

ISOC Pulse

Socrata

Ready to connect HTTP?

Start for free, or book a demo to see how Bruin handles ingestion, quality, lineage, and scheduling for your entire data stack.

Book a demo Read the docs