Vitess
Vitess is a database clustering system for horizontal scaling of MySQL. ingestr supports Vitess as both a source and destination, plus change data capture through vtgate's VStream API.
Vitess speaks the MySQL wire protocol, but it is selected with its own vitess:// scheme (not mysql://) so ingestr uses the Vitess-aware read, write, and CDC paths. Pointing a mysql:// URI at a Vitess server fails fast with a message telling you to use vitess://.
NOTE
PlanetScale is managed Vitess, but it has its own hosted CDC path and is documented separately under the planetscale:// scheme.
URI format
vitess://user:password@host:port/keyspaceURI parameters:
user: the vtgate user namepassword: the password for the userhost: the vtgate MySQL-protocol hostport: the vtgate MySQL-protocol port, usually 3306keyspace: the Vitess keyspace (used as the database)
If your vtgate requires encrypted MySQL-protocol connections, add ?tls=true.
By default Vitess caps queries at 100,000 rows in its OLTP workload, which would otherwise break bulk reads of larger tables. The Vitess source runs in the OLAP workload, so large tables ingest fully.
Vitess as a destination
Vitess is also supported as a destination over the vitess:// URI. Two things differ from plain MySQL:
- The target keyspace must already exist.
CREATE DATABASEis not supported through vtgate, so ingestr never creates keyspaces — create the keyspace via your Vitess control plane first. Staging tables for thereplace,merge,delete+insert, andscd2strategies are created inside the target keyspace rather than in a separate_bruin_stagingdatabase. - Only unsharded (single-shard) keyspaces are supported. If the target keyspace is sharded, ingestr fails fast at connect with a clear error instead of producing a broken load. Sharded keyspaces are unsupported because auto-created tables need a Primary Vindex to be routable, the
merge/delete+insert/scd2strategies useUPDATE … JOIN/INSERT … SELECT … WHERE NOT EXISTSstatements that Vitess rejects across shards, and the atomicRENAMEswap used byreplaceis not atomic across shards. To load a sharded keyspace, pre-create the tables (with vindexes) and manage the load outside ingestr.
Change data capture
Vitess CDC uses the vitess+cdc:// scheme. ingestr streams changes through vtgate's VStream API over gRPC — Vitess is a sharded layer with no standard binary log to tail. It produces the same _cdc_lsn, _cdc_deleted, and _cdc_synced_at metadata columns as the other CDC sources and resumes from the destination table's maximum _cdc_lsn on subsequent runs.
VStream performs a consistent copy-phase snapshot first, then streams changes. Position is tracked with a Vitess GTID (VGTID) serialized into _cdc_lsn. This works for both unsharded and sharded keyspaces, since the VGTID covers every shard. If the stored _cdc_lsn is invalid, the run fails instead of taking a partial snapshot — run with --full-refresh to rebuild. Incremental runs use the merge strategy so updates and deletes are applied by primary key.
VStream uses vtgate's gRPC port, which is different from the MySQL protocol port and cannot be derived from it, so you must supply it with grpc_port. The database in the URI is the Vitess keyspace.
CDC over Vitess opens two connections: the MySQL protocol connection for schema discovery and the vtgate gRPC port for the change stream. A single tls=true secures both connections because the gRPC connection inherits the tls setting. To control the gRPC side independently, use grpc_tls (see below).
ingestr ingest \
--source-uri "vitess+cdc://user:password@host:3306/keyspace?grpc_port=15991&mode=batch" \
--dest-uri "duckdb:///tmp/vitess_cdc.duckdb" \
--source-table "keyspace.orders" \
--dest-table "orders"Vitess CDC URI parameters:
grpc_port: required — the vtgate gRPC port (for example15991). The run fails with a clear error if it is missing.grpc_host: optional vtgate gRPC host; defaults to the host in the URI.grpc_tls: optional override for the gRPC connection's TLS, independent oftls.trueverifies the server certificate,skip-verifyskips verification,falseforces plaintext. When omitted, the gRPC connection inheritstls(true/skip-verifyenable it;preferredand custom CA names do not).mode:batch; defaults tobatch.dest_schema: optional destination schema for multi-table CDC runs.
Requirements:
- The vtgate gRPC endpoint must be reachable (
grpc_port, plusgrpc_hostif it differs from the MySQL host). - Source tables must have primary keys, or
--primary-keymust be provided. - Source tables must not contain
ENUM,SET, orBITcolumns.
Related docs
- MySQL for the generic MySQL/MariaDB connector and binary-log CDC.
- PlanetScale for managed Vitess with hosted
psdbconnectCDC.