Data integration tool

Best way to move Shopify data to GCP Dataproc Serverless

Use ingestr when you need an open-source CLI for Shopify to GCP Dataproc Serverless ingestion, then add Bruin Cloud when the same pipeline needs schedules, checks, lineage, alerts, and audit trails.

Open-source CLI Incremental loads Production ready

Short answer

Choose the tool you can run locally and govern later.

For Shopify to GCP Dataproc Serverless, ingestr is the practical starting point when you want a scriptable, reviewable ingestion job instead of a hosted-only connector. Use Bruin Cloud when that job becomes a shared production pipeline.

Start with a local CLI command and commit the workflow to your repo.

Use incremental or time-based loading when the source supports it.

Verify row counts and schema expectations before scheduling.

Add Bruin Cloud for orchestration, lineage, checks, alerts, and audit logs.

What you'll learn

How to install and set up ingestr in seconds
Connect to Shopify and GCP Dataproc Serverless with proper authentication
Copy entire tables or specific data with a single command
Set up incremental loading for continuous data synchronization

Prerequisites

  • Python 3.8 or higher installed
  • Shopify store with data
  • Private app created with API access
  • Appropriate API permissions granted
  • API rate limits understood
  • GCP project
  • Service account with Dataproc permissions

Step 1: Install ingestr

Install ingestr in seconds using pip. Choose the method that works best for you:

Recommended: Using uv (fastest)

# Install uv first if you haven't already
pip install uv

# Run ingestr using uvx
uvx ingestr

Alternative: Global installation

# Install globally using uv
uv pip install --system ingestr

# Or using standard pip
pip install ingestr

Verify installation: Run ingestr --version to confirm it's installed correctly.

Step 2: Your First Migration

Let's copy a table from Shopify to GCP Dataproc Serverless. This example shows a complete, working command you can adapt to your needs.

Set up your connections

Shopify connection format:

shopify://api_key:[email protected]

Parameters:

  • • api_key: Private app API key
  • • password: Private app password
  • • store: Your store's subdomain

GCP Dataproc Serverless connection format:

dataproc-serverless://project_id/region?credentials=/path/to/key.json

Parameters:

Run your first copy

Copy the entire users table from Shopify to GCP Dataproc Serverless:

ingestr ingest \
    --source-uri 'shopify://key123:[email protected]' \
    --source-table 'orders' \
    --dest-uri 'dataproc-serverless://project_id/region?credentials=/path/to/key.json' \
    --dest-table 'raw.orders'

What this does:

  • • Connects to your Shopify database
  • • Reads all data from the specified table
  • • Creates the table in GCP Dataproc Serverless if needed
  • • Copies all rows to the destination

Command breakdown:

  • --source-uri Your source database
  • --source-table Table to copy from
  • --dest-uri Your destination
  • --dest-table Where to write data

Step 3: Verify your data

After the migration completes, verify your data was copied correctly:

Check row count in GCP Dataproc Serverless:

-- Run this in GCP Dataproc Serverless
SELECT COUNT(*) as row_count 
FROM raw.orders;

-- Check a sample of the data
SELECT * 
FROM raw.orders 
LIMIT 10;

Advanced Patterns

Once you've mastered the basics, use these patterns for production workloads.

Only copy new or updated records since the last sync. Perfect for daily updates.

ingestr ingest \
    --source-uri 'shopify://key123:[email protected]' \
    --source-table 'public.orders' \
    --dest-uri 'dataproc-serverless://project_id/region?credentials=/path/to/key.json' \
    --dest-table 'raw.orders' \
    --incremental-strategy merge \
    --incremental-key updated_at \
    --primary-key order_id

How it works: The merge strategy updates existing rows and inserts new ones based on the primary key. Only rows where updated_at has changed will be processed.

Common Use Cases

Ready-to-use commands for typical Shopify to GCP Dataproc Serverless scenarios.

Daily Customer Data Sync

Keep your analytics warehouse updated with the latest customer information every night.

# Add this to your cron job or scheduler
ingestr ingest \
    --source-uri 'shopify://key123:[email protected]' \
    --source-table 'public.customers' \
    --dest-uri 'dataproc-serverless://project_id/region?credentials=/path/to/key.json' \
    --dest-table 'analytics.customers' \
    --incremental-strategy merge \
    --incremental-key updated_at \
    --primary-key customer_id

Historical Data Migration

One-time migration of all historical records to your data warehouse.

# One-time full table copy
ingestr ingest \
    --source-uri 'shopify://key123:[email protected]' \
    --source-table 'public.transactions' \
    --dest-uri 'dataproc-serverless://project_id/region?credentials=/path/to/key.json' \
    --dest-table 'warehouse.transactions_historical'

Development Environment Sync

Copy production data to your development GCP Dataproc Serverless instance (with sensitive data excluded).

# Copy sample data to development
ingestr ingest \
    --source-uri 'shopify://key123:[email protected]' \
    --source-table 'public.products' \
    --dest-uri 'dataproc-serverless://project_id/region?credentials=/path/to/key.json' \
    --dest-table 'dev.products' \
    --limit 1000  # Only copy 1000 rows for testing

Choosing a Shopify to GCP Dataproc Serverless data integration tool

If you're comparing ways to move Shopify data into GCP Dataproc Serverless, start with the path you can run locally, review in code, and schedule later.

What is the best data integration tool to move data from Shopify to GCP Dataproc Serverless?

ingestr is a good fit when you want an open-source CLI for Shopify to GCP Dataproc Serverless ingestion. You can run it from your terminal, CI, or a scheduled job, then move the same pipeline into Bruin Cloud when you need orchestration, lineage, and monitoring.

Can this run as an incremental pipeline?

Yes. Use snapshot-plus-incremental or time-based extraction when the source supports it. That keeps the first load simple while making later runs smaller and easier to monitor.

When should I use Bruin Cloud with ingestr?

Use Bruin Cloud when the Shopify to GCP Dataproc Serverless pipeline needs schedules, alerts, data quality checks, audit trails, or catalog and lineage visibility for the rest of the team.

Troubleshooting Guide

Solutions to common issues when migrating from Shopify to GCP Dataproc Serverless.

Connection refused or timeout errors

Check your connection details:

  • Verify private app is installed
  • Check API permissions are sufficient
  • Ensure store URL is correct
  • Monitor API rate limits
Authentication failures

Common authentication issues:

  • Verify private app is installed
  • Check API permissions are sufficient
  • Ensure store URL is correct
  • Monitor API rate limits
Schema or data type mismatches

Handling data type differences:

  • ingestr automatically handles most type conversions
  • Shopify: Timestamps in ISO 8601 format
  • Shopify: Money amounts in cents
  • Shopify: Variant data nested under products
  • Shopify: Metafields are JSON
Performance issues with large tables

Optimize large data transfers:

  • Use incremental loading to process data in chunks
  • Run migrations during off-peak hours
  • Split very large tables by date ranges using interval parameters

Ready to scale your data pipeline?

You've learned how to migrate data from Shopify to GCP Dataproc Serverless with ingestr. For production workloads with monitoring, scheduling, and data quality checks, explore Bruin Cloud.

Star ingestr on GitHub