Vitess
Vitess is a MySQL-compatible database clustering and sharding system originally built at YouTube. It speaks the MySQL wire protocol through vtgate, and Bruin connects to it through a dedicated vitess connection. Vitess can be used as both a source and a destination for Ingestr assets, and Vitess change data capture is also supported through VStream.
TIP
If you use the managed PlanetScale platform, follow the PlanetScale guide instead — it uses the hosted psdbconnect API rather than a directly reachable vtgate gRPC port.
Follow the steps below to set up Vitess and run ingestion.
Configuration
Step 1: Add a connection to .bruin.yml file
Add a vitess connection to the connections section of your .bruin.yml file, pointing at your vtgate endpoint:
vitess:
- name: "vitess"
username: "user"
password: "password"
host: "vtgate.internal"
port: 15306
database: "commerce"
# Optional, required only for change data capture:
grpc_port: 15991name: The name to identify this connectionusername: The username to connect through vtgatepassword: The password for the userhost: The vtgate hostport: The vtgate MySQL protocol port (e.g.15306)database: The Vitess keyspace to connect togrpc_port: (Optional) vtgate's gRPC port, used by CDC (e.g.15991)grpc_host: (Optional) Host override for the gRPC connection when it differs from the MySQL hostgrpc_tls: (Optional) Set totrueto use TLS for the gRPC connectionssl_ca_path: (Optional) The path to the CA certificate filessl_cert_path: (Optional) The path to the client certificate filessl_key_path: (Optional) The path to the client key file
Step 2: Create an asset file for data ingestion
To ingest data from Vitess, create an asset configuration file (e.g. vitess_ingestion.yml) inside the assets folder:
name: public.orders
type: ingestr
connection: bigquery
parameters:
source_connection: vitess
source_table: 'orders'
destination: bigqueryname: The name of the asset.type: Set this toingestrto use the ingestr data pipeline.connection: The destination connection where the data should be stored.source_connection: The name of the Vitess connection defined in.bruin.yml.source_table: The name of the table in the keyspace that you want to ingest.
Step 3: Run asset to ingest data
bruin run assets/vitess_ingestion.ymlAs a result of this command, Bruin will ingest data from the given Vitess table into your destination.
Vitess as a destination
You can also use Vitess as a destination by pointing an ingestr asset's connection and destination at a Vitess connection:
name: orders
type: ingestr
connection: vitess
parameters:
source_connection: my_postgres
source_table: 'public.orders'
destination: vitessWhen loading into Vitess, keep two things in mind:
- The target keyspace must already exist. ingestr cannot create keyspaces through vtgate.
- Only unsharded (single-shard) keyspaces are supported as destinations, due to vindex and atomic-operation constraints.
Change data capture (CDC)
Vitess CDC streams inserts, updates, and deletes through vtgate's VStream gRPC API. It is enabled by setting cdc: "true" on an ingestr asset with a Vitess source connection, and requires the vtgate gRPC port — configure it once as grpc_port on the connection (see Step 1).
| Parameter | Required | Description |
|---|---|---|
cdc | Yes | Set to "true" to enable CDC mode |
cdc_grpc_port | No | Overrides the connection's grpc_port for this asset |
cdc_grpc_host | No | Overrides the connection's grpc_host for this asset |
cdc_grpc_tls | No | Overrides the connection's grpc_tls for this asset |
cdc_dest_schema | No | Destination schema to use for multi-table CDC runs |
incremental_strategy | No | Defaults to "merge" when CDC is enabled; can be overridden to "append" |
Requirements:
- Read access to the keyspace over both the MySQL protocol and vtgate's VStream gRPC API.
- Source tables must have primary keys, or
primary_keymust be set on the asset columns. - Source tables must not contain
ENUM,SET, orBITcolumns.
NOTE
When CDC is enabled, primary key columns do not need to be specified in the asset definition — they are determined automatically from the source table.
Example: Vitess VStream CDC
name: orders
type: ingestr
connection: bigquery
parameters:
source_connection: vitess
source_table: 'orders'
destination: bigquery
cdc: "true"The example above relies on grpc_port being set on the vitess connection. For the managed PlanetScale platform, see PlanetScale.