Skip to content

Deploying Bruin on Ubuntu VMs

Managed Option Available

Looking for a fully managed solution? Bruin Cloud provides managed orchestration, monitoring, and scheduling without the operational overhead. Try it free!

This guide walks you through deploying Bruin on Ubuntu-based virtual machines (AWS EC2, Google Cloud Compute Engine, DigitalOcean Droplets, or any Ubuntu server) and scheduling pipeline runs using cron jobs.

Prerequisites

Before you begin, ensure you have:

  • An Ubuntu server (18.04 or later recommended)
  • SSH access to the server with sudo privileges
  • Git installed on the server
  • A Bruin project ready to deploy

Step 1: Connect to Your Server

Connect to your Ubuntu VM via SSH:

bash
ssh username@your-server-ip

Replace username with your actual username and your-server-ip with your server's IP address or hostname.

Step 2: Update System Packages

Always start by updating your system packages:

bash
sudo apt update && sudo apt upgrade -y

Step 3: Install Git (if not already installed)

Git is required to clone your Bruin projects:

bash
sudo apt install git -y

Verify the installation:

bash
git --version

Step 4: Install Bruin CLI

Install Bruin using the official installation script:

bash
curl -LsSf https://getbruin.com/install/cli | sh

Alternatively, you can use wget:

bash
wget -qO- https://getbruin.com/install/cli | sh

The installer will automatically add Bruin to your PATH. You may need to restart your shell or run:

bash
source ~/.bashrc  # or ~/.zshrc if using zsh

Verify the installation:

bash
bruin --version

Step 5: Clone Your Bruin Project

Clone your Bruin project repository to your server:

bash
cd ~
git clone https://github.com/your-username/your-bruin-project.git
cd your-bruin-project

Replace the URL with your actual repository URL.

Step 6: Configure Credentials

Bruin needs access to your data platforms. Set up your credentials in the .bruin.yml file in your project root.

Best Practice: Use Environment Variables

Instead of storing sensitive credentials as plain text in your configuration files, use environment variables. This approach is more secure and makes it easier to manage secrets across different environments.

Create or edit the .bruin.yml file:

bash
nano .bruin.yml

Use the ${VAR_NAME} syntax to reference environment variables in your configuration:

yaml
environments:
  production:
    connections:
      google_cloud_platform:
        - name: "my_gcp"
          service_account_json: ${GCP_SERVICE_ACCOUNT_JSON}
          project_id: ${GCP_PROJECT_ID}

      postgres:
        - name: "my_postgres"
          username: ${POSTGRES_USERNAME}
          password: ${POSTGRES_PASSWORD}
          host: ${POSTGRES_HOST}
          port: ${POSTGRES_PORT}
          database: ${POSTGRES_DATABASE}

Environment variables are expanded at runtime, keeping your .bruin.yml file free of sensitive data.

Setting Up Environment Variables

Create a secure directory and environment file to store your credentials:

bash
sudo mkdir -p /etc/bruin
sudo nano /etc/docs/bruin/credentials.env

Add your credentials:

bash
# Google Cloud Platform
GCP_PROJECT_ID=my-project-id
GCP_SERVICE_ACCOUNT_JSON='{"type":"service_account","project_id":"my-project-id","private_key_id":"...","private_key":"-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----\n","client_email":"...","client_id":"...","auth_uri":"...","token_uri":"...","auth_provider_x509_cert_url":"...","client_x509_cert_url":"..."}'

# PostgreSQL
POSTGRES_USERNAME=postgres_user
POSTGRES_PASSWORD=your_secure_password
POSTGRES_HOST=localhost
POSTGRES_PORT=5432
POSTGRES_DATABASE=mydb

Secure the file:

bash
sudo chmod 600 /etc/docs/bruin/credentials.env
sudo chown $(whoami):$(whoami) /etc/docs/bruin/credentials.env

Loading Environment Variables

For interactive sessions, add to your shell profile:

bash
echo 'set -a; source /etc/docs/bruin/credentials.env; set +a' >> ~/.bashrc
source ~/.bashrc

For cron jobs, source the file in your wrapper script (see Step 12).

Option B: Using Service Account Files

If you prefer to use service account files instead of inline JSON:

yaml
environments:
  production:
    connections:
      google_cloud_platform:
        - name: "my_gcp"
          service_account_file: "/home/username/.config/gcloud/service-account.json"
          project_id: "my-project-id"

      postgres:
        - name: "my_postgres"
          username: "postgres_user"
          password: "your_password"
          host: "localhost"
          port: 5432
          database: "mydb"

Storing Service Account Files

If you're using service account files (e.g., for Google Cloud):

bash
mkdir -p ~/.config/gcloud
nano ~/.config/gcloud/service-account.json

Paste your service account JSON content, save, and secure the file:

bash
chmod 600 ~/.config/gcloud/service-account.json

WARNING

When using service account files, ensure the files are properly secured and never committed to version control.

Step 7: Test Your Pipeline

Before setting up automation, test that your pipeline runs successfully:

bash
cd ~/your-bruin-project
bruin run .

If you want to run a specific pipeline:

bash
bruin run pipelines/my_pipeline

Check for any errors and resolve them before proceeding.

Step 8: Set Up Cron Jobs

Cron is a time-based job scheduler in Unix-like operating systems. You'll use it to run your Bruin pipelines automatically.

Understanding Cron Syntax

Cron uses the following format:

text
* * * * * command-to-execute
│ │ │ │ │
│ │ │ │ └─── Day of week (0-7, Sunday = 0 or 7)
│ │ │ └───── Month (1-12)
│ │ └─────── Day of month (1-31)
│ └───────── Hour (0-23)
└─────────── Minute (0-59)

Examples:

  • 0 * * * * - Every hour at minute 0
  • 0 9 * * * - Every day at 9:00 AM
  • */15 * * * * - Every 15 minutes
  • 0 2 * * 1 - Every Monday at 2:00 AM
  • 0 0 1 * * - First day of every month at midnight

Create a Cron Job

Open your crontab file:

bash
crontab -e

If this is your first time, you'll be asked to choose an editor. Select nano (option 1) for simplicity.

Add a cron job to run your pipeline. Here's an example that runs daily at 3:00 AM:

bash
0 3 * * * /home/username/.local/bin/bruin run /home/username/your-bruin-project >> /home/username/logs/bruin.log 2>&1

Important notes:

  • Use absolute paths for both the Bruin executable and your project directory
  • Replace username with your actual username
  • The >> /home/username/logs/bruin.log 2>&1 redirects output to a log file

Multiple Pipelines with Different Schedules

You can schedule different pipelines at different times:

bash
# Run data ingestion pipeline every hour
0 * * * * /home/username/.local/bin/bruin run /home/username/your-bruin-project/pipelines/ingestion >> /home/username/logs/ingestion.log 2>&1

# Run analytics pipeline daily at 6 AM
0 6 * * * /home/username/.local/bin/bruin run /home/username/your-bruin-project/pipelines/analytics >> /home/username/logs/analytics.log 2>&1

# Run weekly report every Monday at 8 AM
0 8 * * 1 /home/username/.local/bin/bruin run /home/username/your-bruin-project/pipelines/weekly_report >> /home/username/logs/weekly.log 2>&1

Step 9: Set Up Logging

Create a directory for logs:

bash
mkdir -p ~/logs

Your cron jobs will now write outputs to log files in this directory.

View Logs

Check recent logs:

bash
tail -f ~/logs/bruin.log

View last 100 lines:

bash
tail -n 100 ~/logs/bruin.log

Search for errors:

bash
grep -i error ~/logs/bruin.log

Log Rotation

To prevent log files from growing too large, set up log rotation:

bash
sudo nano /etc/logrotate.d/bruin

Add the following configuration:

text
/home/username/logs/*.log {
    daily
    missingok
    rotate 14
    compress
    notifempty
    create 0644 username username
}

This configuration:

  • Rotates logs daily
  • Keeps 14 days of logs
  • Compresses old logs
  • Creates new log files with proper permissions

Step 10: Set Up Environment-Specific Configurations

Use Bruin's environment feature to manage different configurations:

bash
bruin run . --environment production

Update your cron job to use the production environment:

bash
0 3 * * * /home/username/.local/bin/bruin run /home/username/your-bruin-project --environment production >> /home/username/logs/bruin.log 2>&1

Step 11: Monitoring and Alerting

Email Notifications on Failure

Cron can send emails when jobs fail. First, install a mail utility:

bash
sudo apt install mailutils -y

Configure postfix when prompted (select "Internet Site").

Create a wrapper script to handle errors:

bash
nano ~/scripts/run-bruin.sh

Add the following:

bash
#!/bin/bash

LOG_FILE="/home/username/logs/bruin.log"
PROJECT_PATH="/home/username/your-bruin-project"
BRUIN_BIN="/home/username/.local/bin/bruin"

echo "=== Starting Bruin run at $(date) ===" >> "$LOG_FILE"

if ! $BRUIN_BIN run "$PROJECT_PATH" --environment production >> "$LOG_FILE" 2>&1; then
    echo "Bruin pipeline failed at $(date)" | mail -s "Bruin Pipeline Failed" your-email@example.com
    exit 1
fi

echo "=== Completed successfully at $(date) ===" >> "$LOG_FILE"

Make it executable:

bash
chmod +x ~/scripts/run-bruin.sh

Update your crontab:

bash
0 3 * * * /home/username/scripts/run-bruin.sh

Step 12: Automatic Updates

Keep your Bruin project up to date by pulling changes before each run:

Update your wrapper script:

bash
#!/bin/bash

set -e

export PATH="/home/username/.local/bin:$PATH"
export HOME="/home/username"

# Load credentials from secure environment file
if [ -f /etc/docs/bruin/credentials.env ]; then
    set -a
    source /etc/docs/bruin/credentials.env
    set +a
fi

LOG_FILE="/home/username/logs/bruin.log"
PROJECT_PATH="/home/username/your-bruin-project"
BRUIN_BIN="/home/username/.local/bin/bruin"

echo "=== Starting Bruin run at $(date) ===" >> "$LOG_FILE"

# Pull latest changes
cd "$PROJECT_PATH"
git pull origin main >> "$LOG_FILE" 2>&1

# Run the pipeline
if ! $BRUIN_BIN run "$PROJECT_PATH" --environment production >> "$LOG_FILE" 2>&1; then
    echo "Bruin pipeline failed at $(date)" | mail -s "Bruin Pipeline Failed" your-email@example.com
    exit 1
fi

echo "=== Completed successfully at $(date) ===" >> "$LOG_FILE"

The set -a and set +a commands enable and disable automatic export of all variables, ensuring that all credentials from the environment file are available to Bruin.

Security Best Practices

1. Use Environment Variables for Credentials

Instead of hardcoding credentials in .bruin.yml, use environment variables:

yaml
# Good: Using environment variables
postgres:
  - name: "my_postgres"
    username: ${POSTGRES_USERNAME}
    password: ${POSTGRES_PASSWORD}

# Avoid: Hardcoded credentials
postgres:
  - name: "my_postgres"
    username: "admin"
    password: "plaintext_password"

2. Secure Your Credentials

Never commit credentials to Git:

bash
echo ".bruin.yml" >> .gitignore
echo "*.json" >> .gitignore
echo "credentials.env" >> .gitignore

Store credentials in a secure location with restricted permissions:

bash
sudo mkdir -p /etc/bruin
sudo chmod 700 /etc/bruin

3. Use SSH Keys for Git

Set up SSH keys for passwordless Git operations:

bash
ssh-keygen -t ed25519 -C "your-email@example.com"
cat ~/.ssh/id_ed25519.pub

Add the public key to your Git provider (GitHub, GitLab, etc.).

4. Restrict File Permissions

bash
chmod 600 ~/.bruin.yml
chmod 600 ~/.config/gcloud/*.json
sudo chmod 600 /etc/docs/bruin/credentials.env

5. Use a Dedicated User

Create a dedicated user for running Bruin:

bash
sudo useradd -m -s /bin/bash bruin
sudo su - bruin

Then follow all the installation steps as the bruin user.

Troubleshooting

Cron Job Not Running

  1. Check if cron service is running:
bash
sudo systemctl status cron
  1. Check cron logs:
bash
grep CRON /var/log/syslog
  1. Verify your crontab:
bash
crontab -l

Bruin Command Not Found in Cron

Cron has a limited environment. Always use absolute paths:

bash
# Find the full path to bruin
which bruin

# Use the full path in crontab
0 3 * * * /home/username/.local/bin/bruin run /home/username/your-bruin-project

Permission Denied Errors

Ensure your user has permission to access all files:

bash
chmod +x ~/.local/bin/bruin
chmod -R 755 ~/your-bruin-project

Connection Issues

Test your connections:

bash
bruin connections --environment production

Pipeline Fails in Cron but Works Manually

This often happens due to environment differences. Export all necessary environment variables in your wrapper script:

bash
#!/bin/bash
export PATH="/home/username/.local/bin:$PATH"
export HOME="/home/username"
# Add other environment variables here

# Run your pipeline
/home/username/.local/bin/bruin run /home/username/your-bruin-project

Example: Complete Production Setup

Here's a complete example for a production deployment using environment variable injection:

Directory Structure

text
/home/docs/bruin/
├── projects/
│   └── analytics-pipeline/
│       └── .bruin.yml
├── scripts/
│   ├── run-ingestion.sh
│   └── run-analytics.sh
└── logs/
    ├── ingestion.log
    └── analytics.log

/etc/docs/bruin/
└── credentials.env

.bruin.yml (in project root)

yaml
default_environment: production

environments:
  production:
    connections:
      google_cloud_platform:
        - name: "gcp-prod"
          service_account_json: ${GCP_SERVICE_ACCOUNT_JSON}
          project_id: ${GCP_PROJECT_ID}

      postgres:
        - name: "postgres-analytics"
          username: ${POSTGRES_USERNAME}
          password: ${POSTGRES_PASSWORD}
          host: ${POSTGRES_HOST}
          port: ${POSTGRES_PORT}
          database: ${POSTGRES_DATABASE}

      snowflake:
        - name: "snowflake-prod"
          account: ${SNOWFLAKE_ACCOUNT}
          username: ${SNOWFLAKE_USERNAME}
          password: ${SNOWFLAKE_PASSWORD}
          database: ${SNOWFLAKE_DATABASE}
          warehouse: ${SNOWFLAKE_WAREHOUSE}

/etc/docs/bruin/credentials.env

bash
# Google Cloud Platform
GCP_PROJECT_ID=my-analytics-project
GCP_SERVICE_ACCOUNT_JSON='{"type":"service_account","project_id":"my-analytics-project",...}'

# PostgreSQL
POSTGRES_USERNAME=analytics_user
POSTGRES_PASSWORD=super_secure_password
POSTGRES_HOST=db.company.com
POSTGRES_PORT=5432
POSTGRES_DATABASE=analytics

# Snowflake
SNOWFLAKE_ACCOUNT=ABC12345
SNOWFLAKE_USERNAME=bruin_user
SNOWFLAKE_PASSWORD=snowflake_secure_password
SNOWFLAKE_DATABASE=ANALYTICS
SNOWFLAKE_WAREHOUSE=COMPUTE_WH

Crontab

bash
# Pull and run ingestion pipeline every 6 hours
0 */6 * * * /home/docs/bruin/scripts/run-ingestion.sh

# Run analytics pipeline daily at 2 AM
0 2 * * * /home/docs/bruin/scripts/run-analytics.sh

run-analytics.sh

bash
#!/bin/bash

set -e

export PATH="/home/docs/bruin/.local/bin:$PATH"
export HOME="/home/bruin"

# Load credentials from secure environment file
if [ -f /etc/docs/bruin/credentials.env ]; then
    set -a
    source /etc/docs/bruin/credentials.env
    set +a
fi

LOG_FILE="/home/docs/bruin/logs/analytics.log"
PROJECT_PATH="/home/docs/bruin/projects/analytics-pipeline"

echo "=== Starting analytics run at $(date) ===" >> "$LOG_FILE"

cd "$PROJECT_PATH"
git pull origin main >> "$LOG_FILE" 2>&1

if ! bruin run . --environment production >> "$LOG_FILE" 2>&1; then
    echo "Analytics pipeline failed at $(date)" | mail -s "Alert: Analytics Pipeline Failed" admin@company.com
    exit 1
fi

echo "=== Completed successfully at $(date) ===" >> "$LOG_FILE"

Next Steps

Additional Resources