Cassandra
Apache Cassandra is a distributed wide-column database designed for high availability and large-scale workloads.
Bruin supports Cassandra as a source for Ingestr assets, and you can use it to ingest data from Cassandra into your data warehouse.
In order to set up Cassandra as a source, you need to add a connection in the .bruin.yml file and reference it in an asset file.
Follow the steps below to correctly set up Cassandra as a data source and run ingestion.
Configuration
Step 1: Add a connection to .bruin.yml file
To connect to Cassandra, add a configuration item to the connections section of the .bruin.yml file:
connections:
cassandra:
- name: "my-cassandra"
username: "cassandra"
password: "cassandra"
host: "localhost"
port: 9042
keyspace: "analytics"
consistency: "local_quorum"
page_size: 1000
timeout: "10s"
connect_timeout: "5s"
ssl: false
disable_initial_host_lookup: truename: The name to identify this Cassandra connection.username: (Optional) The Cassandra username.password: (Optional) The password for the specified username.host: The Cassandra seed host.hosts: (Optional) Additional seed hosts. Bruin passes these to ingestr as the Cassandrahostsquery parameter.port: (Optional) The Cassandra native transport port. Defaults to9042.keyspace: (Optional) The default Cassandra keyspace.consistency: (Optional) Cassandra consistency level, such asone,quorum,local_quorum, orall.page_size: (Optional) Source read page size.timeout: (Optional) Request timeout as a Go duration string, such as10s.connect_timeout: (Optional) Connection timeout as a Go duration string, such as5s.ssl: (Optional) Set totrueto enable TLS.disable_initial_host_lookup: (Optional) Set totruefor single-node Docker or NAT setups.
For multiple Cassandra seed hosts:
connections:
cassandra:
- name: "cassandra-cluster"
username: "cassandra"
password: "cassandra"
host: "cass-1.example.com"
hosts:
- "cass-1.example.com"
- "cass-2.example.com"
- "cass-3.example.com"
keyspace: "analytics"Step 2: Create an asset file for data ingestion
To ingest data from Cassandra, create an asset configuration file. This file defines the data flow from the source to the destination. Create a YAML file (e.g., cassandra_ingestion.yml) inside the assets folder and add the following content:
name: public.cassandra_events
type: ingestr
connection: postgres
parameters:
source_connection: my-cassandra
source_table: "events"
destination: postgresIf the connection does not include a keyspace, use a fully qualified table name:
name: public.cassandra_events
type: ingestr
connection: postgres
parameters:
source_connection: my-cassandra
source_table: "analytics.events"
destination: postgresCustom CQL queries are also supported:
name: public.cassandra_events
type: ingestr
connection: postgres
parameters:
source_connection: my-cassandra
source_table: "query:SELECT id, event_type, created_at FROM analytics.events WHERE created_at >= :interval_start ALLOW FILTERING"
destination: postgresname: The name of the asset.type: Specifies the type of the asset. Set this toingestrfor Cassandra.connection: This is the destination connection, which defines where the data should be stored. For example,postgresindicates that the ingested data will be stored in a Postgres database.source_connection: The name of the Cassandra connection defined in.bruin.yml.source_table: The Cassandra source table. Use a plain table name when the connection includes a keyspace, a fully qualifiedkeyspace.tablename, or aquery:CQL query.destination: The destination platform for the ingestr asset.
Step 3: Run asset to ingest data
bruin run assets/cassandra_ingestion.ymlAs a result of this command, Bruin will ingest data from the given Cassandra table into your destination.