HTTP + Bruin
Ingest HTTP data into your warehouse with incremental loading, quality checks, and full lineage. Defined in YAML, version-controlled in Git.
For business teams
What you get
API data, on schedule
HTTP data lands in your warehouse automatically. No scripts to maintain, no pagination to handle.
Only fetch what changed
Incremental sync means no re-processing. Bruin tracks watermarks so you only get new and updated records.
Catch API changes early
Quality checks validate response data on every sync. Schema changes or missing fields get caught before they break models.
Transform in the same pipeline
Reshape HTTP API data with SQL or Python. Compute metrics, normalize schemas, and build models — all version-controlled.
For data & engineering teams
How it works
Managed pagination & retries
Bruin handles HTTP API pagination, rate limiting, and retries. You define the source — Bruin does the rest.
YAML-defined, Git-versioned
Your HTTP pipeline is a YAML file. Review in PRs, deploy with CI/CD, roll back with git revert.
Incremental with watermarks
Bruin tracks cursor positions and watermarks. Only new and updated HTTP records get fetched on each run.
Schema validation on responses
Quality checks validate HTTP API response structure on every sync. Catch breaking API changes early.
Before you start
Step 1
Add your HTTP connection
Connect to any HTTP/REST API endpoint. Add this to your Bruin environment file — credentials are stored securely and referenced by name in your pipeline YAML.
Parameters
urlThe HTTP(S) endpoint URLheadersCustom headers as JSON (optional)auth_typeAuthentication type (bearer, basic, etc.)
connections:
http:
type: http
uri: "http://?url=<api_url>&headers=<headers>&auth_type=<auth_type>"Step 2
Create your pipeline
Define a YAML asset that tells Bruin what to pull from HTTP and where to land it. This file lives in your Git repo — reviewable, version-controlled, and deployable with CI/CD.
name: raw.http_data
type: ingestr
parameters:
source_connection: http
source_table: 'data'
destination: bigqueryStep 3
Add quality checks
Add column-level and custom SQL checks to your HTTP data. If a check fails, the pipeline stops — bad data never reaches downstream models or dashboards.
columns:
- name: id
checks:
- name: not_null
- name: unique
- name: fetched_at
checks:
- name: not_null
custom_checks:
- name: API data is fresh
query: |
SELECT MAX(fetched_at) >
TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 24 HOUR)
FROM raw.http_dataStep 4
Run it
One command. Bruin connects to HTTP, pulls data incrementally, runs your quality checks, and lands clean data in your warehouse. If a check fails, the pipeline stops — bad data never reaches downstream.
--start-date$ bruin run .Running pipeline...
http_data
✓ Fetched 2,847 new records
✓ Quality: campaign_id not_null PASSED
✓ Quality: spend not_null PASSED
✓ Quality: no negative ad spend PASSED
✓ Loaded into bigquery
Completed in 12sOther API integrations
Ready to connect HTTP?
Start for free, or book a demo to see how Bruin handles ingestion, quality, lineage, and scheduling for your entire data stack.