All integrations
Apache Kafka
+
Bruin

Apache Kafka + Bruin

Source

Ingest Apache Kafka data into your warehouse with incremental loading, quality checks, and full lineage. Defined in YAML, version-controlled in Git.

For business teams

What you get

  • Files and events in your warehouse

    Apache Kafka data lands in your warehouse with automatic schema detection. No manual parsing, no format guessing.

  • Schema drift protection

    Quality checks catch unexpected format changes, null values, and schema drift from Apache Kafka before it breaks models.

  • Data lake orchestration

    Use Apache Kafka as a staging layer. Bruin handles landing, transforming, and materializing — all in one pipeline.

  • Multi-cloud flexibility

    Move data between Apache Kafka and other storage or warehouses. Bruin manages scheduling, retries, and lineage.

For data & engineering teams

How it works

  • Automatic schema detection

    Bruin detects Apache Kafka data schemas automatically. No manual configuration when formats change.

  • YAML-defined, Git-versioned

    Your Apache Kafka pipeline is a YAML file. Review in PRs, deploy with CI/CD, roll back with git revert.

  • Format validation

    Quality checks catch schema drift, unexpected nulls, and format changes from Apache Kafka at the ingestion layer.

  • Land, transform, materialize

    Use Apache Kafka as staging. Bruin handles the full flow: land raw data, transform, and materialize into your warehouse.

Before you start

Kafka cluster access
Consumer group permissions

Step 1

Add your Apache Kafka connection

Connect using Kafka broker configuration with authentication. Add this to your Bruin environment file — credentials are stored securely and referenced by name in your pipeline YAML.

Parameters

  • bootstrap_serversKafka server or servers to connect to (host:port format)
  • group_idConsumer group ID for identifying the client
  • security_protocolProtocol for broker communication (e.g., SASL_SSL)
  • sasl_mechanismsSASL mechanism for authentication (e.g., PLAIN)
  • sasl_usernameUsername for SASL authentication
  • sasl_passwordPassword for SASL authentication
  • batch_sizeNumber of messages to fetch per batch (default: 3000)
  • batch_timeoutMaximum wait time for messages in seconds (default: 3)
connections:
  kafka:
    type: kafka
    uri: "kafka://?bootstrap_servers=localhost:9092&group_id=test_group&security_protocol=SASL_SSL&sasl_mechanisms=PLAIN&sasl_username=example_username&sasl_password=example_secret&batch_size=1000&batch_timeout=3"

Step 2

Create your pipeline

Define a YAML asset that tells Bruin what to pull from Apache Kafka and where to land it. This file lives in your Git repo — reviewable, version-controlled, and deployable with CI/CD.

name: raw.kafka_data
type: ingestr

parameters:
  source_connection: kafka
  source_table: 'data'
  destination: bigquery

Step 3

Add quality checks

Add column-level and custom SQL checks to your Apache Kafka data. If a check fails, the pipeline stops — bad data never reaches downstream models or dashboards.

Catch events with future timestamps
Validate file paths and timestamps are present
Flag schema drift at the ingestion layer
columns:
  - name: file_path
    checks:
      - name: not_null
  - name: event_timestamp
    checks:
      - name: not_null

custom_checks:
  - name: no events from the future
    query: |
      SELECT COUNT(*) = 0
      FROM raw.kafka_data
      WHERE event_timestamp > CURRENT_TIMESTAMP()

Step 4

Run it

One command. Bruin connects to Apache Kafka, pulls data incrementally, runs your quality checks, and lands clean data in your warehouse. If a check fails, the pipeline stops — bad data never reaches downstream.

Backfill historical data with --start-date
Schedule with cron or trigger from CI/CD
Full lineage from Apache Kafka to your dashboards
$ bruin run .
Running pipeline...

  kafka_data
    ✓ Fetched 2,847 new records
    ✓ Quality: campaign_id not_null     PASSED
    ✓ Quality: spend not_null           PASSED
    ✓ Quality: no negative ad spend     PASSED
    ✓ Loaded into bigquery

  Completed in 12s

Ready to connect Apache Kafka?

Start for free, or book a demo to see how Bruin handles ingestion, quality, lineage, and scheduling for your entire data stack.