Data Ingestion
Bruin has built-in data ingestion capabilities thanks to ingestr. The basic idea is simple:
- you have data sources
- each source may have one or more tables/streams
- e.g. for Shopify, you have customers, orders, products, each being separate tables.
- you want to load these to a destination data platform
Ingestr abstracts away all of these in the concept of sources, destinations and tables.
Using Bruin, you can load data from any source into your data platforms as a regular asset.
Definition Schema
Ingestr assets are defined in a simple YAML file:
yaml
name: raw.customers
type: ingestr
parameters:
source_connection: <source-connection-name>
source_table: customers
destination: bigqueryThe interesting part is in the parameters list:
source_connection: the connection that defines the source platformsource_table: the table name for that source on ingestrdestination: the destination you'd like to load the data on
Effectively, this asset will run ingestr in the background and load the data to your data warehouse.
Examples
There are various combinations of sources and destinations, but below are a few examples for common scenarios.
Load data from Postgres -> BigQuery
yaml
name: raw.customers
type: ingestr
parameters:
source_connection: my-postgres
source_table: raw.customers
destination: bigqueryShopify Orders -> Snowflake
yaml
name: raw.orders
type: ingestr
parameters:
source_connection: my-shopify
source_table: orders
destination: snowflakeKafka -> BigQuery
yaml
name: raw.topic1
type: ingestr
parameters:
source_connection: my-kafka
source_table: topic1
destination: bigquery