Deploying Bruin on Ubuntu VMs

Managed Option Available

Looking for a fully managed solution? Bruin Cloud provides managed orchestration, monitoring, and scheduling without the operational overhead. Try it free!

This guide walks you through deploying Bruin on Ubuntu-based virtual machines (AWS EC2, Google Cloud Compute Engine, DigitalOcean Droplets, or any Ubuntu server) and scheduling pipeline runs using cron jobs.

Prerequisites

Before you begin, ensure you have:

An Ubuntu server (18.04 or later recommended)
SSH access to the server with sudo privileges
Git installed on the server
A Bruin project ready to deploy

Step 1: Connect to Your Server

Connect to your Ubuntu VM via SSH:

bash

ssh username@your-server-ip

Replace username with your actual username and your-server-ip with your server's IP address or hostname.

Step 2: Update System Packages

Always start by updating your system packages:

bash

sudo apt update && sudo apt upgrade -y

Step 3: Install Git (if not already installed)

Git is required to clone your Bruin projects:

bash

sudo apt install git -y

Verify the installation:

bash

git --version

Step 4: Install Bruin CLI

Install Bruin using the official installation script:

bash

curl -LsSf https://getbruin.com/install/cli | sh

Alternatively, you can use wget:

bash

wget -qO- https://getbruin.com/install/cli | sh

The installer will automatically add Bruin to your PATH. You may need to restart your shell or run:

bash

source ~/.bashrc  # or ~/.zshrc if using zsh

Verify the installation:

bash

bruin --version

Step 5: Clone Your Bruin Project

Clone your Bruin project repository to your server:

bash

cd ~
git clone https://github.com/your-username/your-bruin-project.git
cd your-bruin-project

Replace the URL with your actual repository URL.

Step 6: Configure Credentials

Bruin needs access to your data platforms. Set up your credentials in the .bruin.yml file in your project root.

Best Practice: Use Environment Variables

Instead of storing sensitive credentials as plain text in your configuration files, use environment variables. This approach is more secure and makes it easier to manage secrets across different environments.

Create or edit the .bruin.yml file:

bash

nano .bruin.yml

Option A: Using Environment Variables (Recommended)

Use the ${VAR_NAME} syntax to reference environment variables in your configuration:

yaml

environments:
  production:
    connections:
      google_cloud_platform:
        - name: "my_gcp"
          service_account_json: ${GCP_SERVICE_ACCOUNT_JSON}
          project_id: ${GCP_PROJECT_ID}

      postgres:
        - name: "my_postgres"
          username: ${POSTGRES_USERNAME}
          password: ${POSTGRES_PASSWORD}
          host: ${POSTGRES_HOST}
          port: ${POSTGRES_PORT}
          database: ${POSTGRES_DATABASE}

Environment variables are expanded at runtime, keeping your .bruin.yml file free of sensitive data.

Setting Up Environment Variables

Create a secure directory and environment file to store your credentials:

bash

sudo mkdir -p /etc/bruin
sudo nano /etc/docs/bruin/credentials.env

Add your credentials:

bash

# Google Cloud Platform
GCP_PROJECT_ID=my-project-id
GCP_SERVICE_ACCOUNT_JSON='{"type":"service_account","project_id":"my-project-id","private_key_id":"...","private_key":"-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----\n","client_email":"...","client_id":"...","auth_uri":"...","token_uri":"...","auth_provider_x509_cert_url":"...","client_x509_cert_url":"..."}'

# PostgreSQL
POSTGRES_USERNAME=postgres_user
POSTGRES_PASSWORD=your_secure_password
POSTGRES_HOST=localhost
POSTGRES_PORT=5432
POSTGRES_DATABASE=mydb

Secure the file:

bash

sudo chmod 600 /etc/docs/bruin/credentials.env
sudo chown $(whoami):$(whoami) /etc/docs/bruin/credentials.env

Loading Environment Variables

For interactive sessions, add to your shell profile:

bash

echo 'set -a; source /etc/docs/bruin/credentials.env; set +a' >> ~/.bashrc
source ~/.bashrc

For cron jobs, source the file in your wrapper script (see Step 12).

Option B: Using Service Account Files

If you prefer to use service account files instead of inline JSON:

yaml

environments:
  production:
    connections:
      google_cloud_platform:
        - name: "my_gcp"
          service_account_file: "/home/username/.config/gcloud/service-account.json"
          project_id: "my-project-id"

      postgres:
        - name: "my_postgres"
          username: "postgres_user"
          password: "your_password"
          host: "localhost"
          port: 5432
          database: "mydb"

Storing Service Account Files

If you're using service account files (e.g., for Google Cloud):

bash

mkdir -p ~/.config/gcloud
nano ~/.config/gcloud/service-account.json

Paste your service account JSON content, save, and secure the file:

bash

chmod 600 ~/.config/gcloud/service-account.json

WARNING

When using service account files, ensure the files are properly secured and never committed to version control.

Step 7: Test Your Pipeline

Before setting up automation, test that your pipeline runs successfully:

bash

cd ~/your-bruin-project
bruin run .

If you want to run a specific pipeline:

bash

bruin run pipelines/my_pipeline

Check for any errors and resolve them before proceeding.

Step 8: Set Up Cron Jobs

Cron is a time-based job scheduler in Unix-like operating systems. You'll use it to run your Bruin pipelines automatically.

Understanding Cron Syntax

Cron uses the following format:

text

* * * * * command-to-execute
│ │ │ │ │
│ │ │ │ └─── Day of week (0-7, Sunday = 0 or 7)
│ │ │ └───── Month (1-12)
│ │ └─────── Day of month (1-31)
│ └───────── Hour (0-23)
└─────────── Minute (0-59)

Examples:

0 * * * * - Every hour at minute 0
0 9 * * * - Every day at 9:00 AM
*/15 * * * * - Every 15 minutes
0 2 * * 1 - Every Monday at 2:00 AM
0 0 1 * * - First day of every month at midnight

Create a Cron Job

Open your crontab file:

bash

crontab -e

If this is your first time, you'll be asked to choose an editor. Select nano (option 1) for simplicity.

Add a cron job to run your pipeline. Here's an example that runs daily at 3:00 AM:

bash

0 3 * * * /home/username/.local/bin/bruin run /home/username/your-bruin-project >> /home/username/logs/bruin.log 2>&1

Important notes:

Use absolute paths for both the Bruin executable and your project directory
Replace username with your actual username
The >> /home/username/logs/bruin.log 2>&1 redirects output to a log file

Multiple Pipelines with Different Schedules

You can schedule different pipelines at different times:

bash

# Run data ingestion pipeline every hour
0 * * * * /home/username/.local/bin/bruin run /home/username/your-bruin-project/pipelines/ingestion >> /home/username/logs/ingestion.log 2>&1

# Run analytics pipeline daily at 6 AM
0 6 * * * /home/username/.local/bin/bruin run /home/username/your-bruin-project/pipelines/analytics >> /home/username/logs/analytics.log 2>&1

# Run weekly report every Monday at 8 AM
0 8 * * 1 /home/username/.local/bin/bruin run /home/username/your-bruin-project/pipelines/weekly_report >> /home/username/logs/weekly.log 2>&1

Step 9: Set Up Logging

Create a directory for logs:

bash

mkdir -p ~/logs

Your cron jobs will now write outputs to log files in this directory.

View Logs

Check recent logs:

bash

tail -f ~/logs/bruin.log

View last 100 lines:

bash

tail -n 100 ~/logs/bruin.log

Search for errors:

bash

grep -i error ~/logs/bruin.log

Log Rotation

To prevent log files from growing too large, set up log rotation:

bash

sudo nano /etc/logrotate.d/bruin

Add the following configuration:

text

/home/username/logs/*.log {
    daily
    missingok
    rotate 14
    compress
    notifempty
    create 0644 username username
}

This configuration:

Rotates logs daily
Keeps 14 days of logs
Compresses old logs
Creates new log files with proper permissions

Step 10: Set Up Environment-Specific Configurations

Use Bruin's environment feature to manage different configurations:

bash

bruin run . --environment production

Update your cron job to use the production environment:

bash

0 3 * * * /home/username/.local/bin/bruin run /home/username/your-bruin-project --environment production >> /home/username/logs/bruin.log 2>&1

Step 11: Monitoring and Alerting

Email Notifications on Failure

Cron can send emails when jobs fail. First, install a mail utility:

bash

sudo apt install mailutils -y

Configure postfix when prompted (select "Internet Site").

Create a wrapper script to handle errors:

bash

nano ~/scripts/run-bruin.sh

Add the following:

bash

#!/bin/bash

LOG_FILE="/home/username/logs/bruin.log"
PROJECT_PATH="/home/username/your-bruin-project"
BRUIN_BIN="/home/username/.local/bin/bruin"

echo "=== Starting Bruin run at $(date) ===" >> "$LOG_FILE"

if ! $BRUIN_BIN run "$PROJECT_PATH" --environment production >> "$LOG_FILE" 2>&1; then
    echo "Bruin pipeline failed at $(date)" | mail -s "Bruin Pipeline Failed" your-email@example.com
    exit 1
fi

echo "=== Completed successfully at $(date) ===" >> "$LOG_FILE"

Make it executable:

bash

chmod +x ~/scripts/run-bruin.sh

Update your crontab:

bash

0 3 * * * /home/username/scripts/run-bruin.sh

Step 12: Automatic Updates

Keep your Bruin project up to date by pulling changes before each run:

Update your wrapper script:

bash

#!/bin/bash

set -e

export PATH="/home/username/.local/bin:$PATH"
export HOME="/home/username"

# Load credentials from secure environment file
if [ -f /etc/docs/bruin/credentials.env ]; then
    set -a
    source /etc/docs/bruin/credentials.env
    set +a
fi

LOG_FILE="/home/username/logs/bruin.log"
PROJECT_PATH="/home/username/your-bruin-project"
BRUIN_BIN="/home/username/.local/bin/bruin"

echo "=== Starting Bruin run at $(date) ===" >> "$LOG_FILE"

# Pull latest changes
cd "$PROJECT_PATH"
git pull origin main >> "$LOG_FILE" 2>&1

# Run the pipeline
if ! $BRUIN_BIN run "$PROJECT_PATH" --environment production >> "$LOG_FILE" 2>&1; then
    echo "Bruin pipeline failed at $(date)" | mail -s "Bruin Pipeline Failed" your-email@example.com
    exit 1
fi

echo "=== Completed successfully at $(date) ===" >> "$LOG_FILE"

The set -a and set +a commands enable and disable automatic export of all variables, ensuring that all credentials from the environment file are available to Bruin.

Security Best Practices

1. Use Environment Variables for Credentials

Instead of hardcoding credentials in .bruin.yml, use environment variables:

yaml

# Good: Using environment variables
postgres:
  - name: "my_postgres"
    username: ${POSTGRES_USERNAME}
    password: ${POSTGRES_PASSWORD}

# Avoid: Hardcoded credentials
postgres:
  - name: "my_postgres"
    username: "admin"
    password: "plaintext_password"

2. Secure Your Credentials

Never commit credentials to Git:

bash

echo ".bruin.yml" >> .gitignore
echo "*.json" >> .gitignore
echo "credentials.env" >> .gitignore

Store credentials in a secure location with restricted permissions:

bash

sudo mkdir -p /etc/bruin
sudo chmod 700 /etc/bruin

3. Use SSH Keys for Git

Set up SSH keys for passwordless Git operations:

bash

ssh-keygen -t ed25519 -C "your-email@example.com"
cat ~/.ssh/id_ed25519.pub

Add the public key to your Git provider (GitHub, GitLab, etc.).

4. Restrict File Permissions

bash

chmod 600 ~/.bruin.yml
chmod 600 ~/.config/gcloud/*.json
sudo chmod 600 /etc/docs/bruin/credentials.env

5. Use a Dedicated User

Create a dedicated user for running Bruin:

bash

sudo useradd -m -s /bin/bash bruin
sudo su - bruin

Then follow all the installation steps as the bruin user.

Troubleshooting

Cron Job Not Running

Check if cron service is running:

bash

sudo systemctl status cron

Check cron logs:

bash

grep CRON /var/log/syslog

Verify your crontab:

bash

crontab -l

Bruin Command Not Found in Cron

Cron has a limited environment. Always use absolute paths:

bash

# Find the full path to bruin
which bruin

# Use the full path in crontab
0 3 * * * /home/username/.local/bin/bruin run /home/username/your-bruin-project

Permission Denied Errors

Ensure your user has permission to access all files:

bash

chmod +x ~/.local/bin/bruin
chmod -R 755 ~/your-bruin-project

Connection Issues

Test your connections:

bash

bruin connections --environment production

Pipeline Fails in Cron but Works Manually

This often happens due to environment differences. Export all necessary environment variables in your wrapper script:

bash

#!/bin/bash
export PATH="/home/username/.local/bin:$PATH"
export HOME="/home/username"
# Add other environment variables here

# Run your pipeline
/home/username/.local/bin/bruin run /home/username/your-bruin-project

Example: Complete Production Setup

Here's a complete example for a production deployment using environment variable injection:

Directory Structure

text

/home/docs/bruin/
├── projects/
│   └── analytics-pipeline/
│       └── .bruin.yml
├── scripts/
│   ├── run-ingestion.sh
│   └── run-analytics.sh
└── logs/
    ├── ingestion.log
    └── analytics.log

/etc/docs/bruin/
└── credentials.env

.bruin.yml (in project root)

yaml

default_environment: production

environments:
  production:
    connections:
      google_cloud_platform:
        - name: "gcp-prod"
          service_account_json: ${GCP_SERVICE_ACCOUNT_JSON}
          project_id: ${GCP_PROJECT_ID}

      postgres:
        - name: "postgres-analytics"
          username: ${POSTGRES_USERNAME}
          password: ${POSTGRES_PASSWORD}
          host: ${POSTGRES_HOST}
          port: ${POSTGRES_PORT}
          database: ${POSTGRES_DATABASE}

      snowflake:
        - name: "snowflake-prod"
          account: ${SNOWFLAKE_ACCOUNT}
          username: ${SNOWFLAKE_USERNAME}
          password: ${SNOWFLAKE_PASSWORD}
          database: ${SNOWFLAKE_DATABASE}
          warehouse: ${SNOWFLAKE_WAREHOUSE}

/etc/docs/bruin/credentials.env

bash

# Google Cloud Platform
GCP_PROJECT_ID=my-analytics-project
GCP_SERVICE_ACCOUNT_JSON='{"type":"service_account","project_id":"my-analytics-project",...}'

# PostgreSQL
POSTGRES_USERNAME=analytics_user
POSTGRES_PASSWORD=super_secure_password
POSTGRES_HOST=db.company.com
POSTGRES_PORT=5432
POSTGRES_DATABASE=analytics

# Snowflake
SNOWFLAKE_ACCOUNT=ABC12345
SNOWFLAKE_USERNAME=bruin_user
SNOWFLAKE_PASSWORD=snowflake_secure_password
SNOWFLAKE_DATABASE=ANALYTICS
SNOWFLAKE_WAREHOUSE=COMPUTE_WH

Crontab

bash

# Pull and run ingestion pipeline every 6 hours
0 */6 * * * /home/docs/bruin/scripts/run-ingestion.sh

# Run analytics pipeline daily at 2 AM
0 2 * * * /home/docs/bruin/scripts/run-analytics.sh

run-analytics.sh

bash

#!/bin/bash

set -e

export PATH="/home/docs/bruin/.local/bin:$PATH"
export HOME="/home/bruin"

# Load credentials from secure environment file
if [ -f /etc/docs/bruin/credentials.env ]; then
    set -a
    source /etc/docs/bruin/credentials.env
    set +a
fi

LOG_FILE="/home/docs/bruin/logs/analytics.log"
PROJECT_PATH="/home/docs/bruin/projects/analytics-pipeline"

echo "=== Starting analytics run at $(date) ===" >> "$LOG_FILE"

cd "$PROJECT_PATH"
git pull origin main >> "$LOG_FILE" 2>&1

if ! bruin run . --environment production >> "$LOG_FILE" 2>&1; then
    echo "Analytics pipeline failed at $(date)" | mail -s "Alert: Analytics Pipeline Failed" admin@company.com
    exit 1
fi

echo "=== Completed successfully at $(date) ===" >> "$LOG_FILE"

Next Steps

Explore Bruin Cloud for managed orchestration and monitoring
Set up CI/CD integration for automated testing
Learn about quality checks to ensure data quality
Review best practices for pipeline design

Python

Dashboard

Panels Overview

Side Panel

Deploying Bruin on Ubuntu VMs ​

Prerequisites ​

Step 1: Connect to Your Server ​

Step 2: Update System Packages ​

Step 3: Install Git (if not already installed) ​

Step 4: Install Bruin CLI ​

Step 5: Clone Your Bruin Project ​

Step 6: Configure Credentials ​

Option A: Using Environment Variables (Recommended) ​

Setting Up Environment Variables ​

Loading Environment Variables ​

Option B: Using Service Account Files ​

Storing Service Account Files ​

Step 7: Test Your Pipeline ​

Step 8: Set Up Cron Jobs ​

Understanding Cron Syntax ​

Create a Cron Job ​

Multiple Pipelines with Different Schedules ​

Step 9: Set Up Logging ​

View Logs ​

Log Rotation ​

Step 10: Set Up Environment-Specific Configurations ​

Step 11: Monitoring and Alerting ​

Email Notifications on Failure ​

Step 12: Automatic Updates ​

Security Best Practices ​

1. Use Environment Variables for Credentials ​

2. Secure Your Credentials ​

3. Use SSH Keys for Git ​

4. Restrict File Permissions ​

5. Use a Dedicated User ​

Troubleshooting ​

Cron Job Not Running ​

Bruin Command Not Found in Cron ​

Permission Denied Errors ​

Connection Issues ​

Pipeline Fails in Cron but Works Manually ​

Example: Complete Production Setup ​

Directory Structure ​

.bruin.yml (in project root) ​

/etc/docs/bruin/credentials.env ​

Crontab ​

run-analytics.sh ​

Next Steps ​

Additional Resources ​

Deploying Bruin on Ubuntu VMs

Prerequisites

Step 1: Connect to Your Server

Step 2: Update System Packages

Step 3: Install Git (if not already installed)

Step 4: Install Bruin CLI

Step 5: Clone Your Bruin Project

Step 6: Configure Credentials

Option A: Using Environment Variables (Recommended)

Setting Up Environment Variables

Loading Environment Variables

Option B: Using Service Account Files

Storing Service Account Files

Step 7: Test Your Pipeline

Step 8: Set Up Cron Jobs

Understanding Cron Syntax

Create a Cron Job

Multiple Pipelines with Different Schedules

Step 9: Set Up Logging

View Logs

Log Rotation

Step 10: Set Up Environment-Specific Configurations

Step 11: Monitoring and Alerting

Email Notifications on Failure

Step 12: Automatic Updates

Security Best Practices

1. Use Environment Variables for Credentials

2. Secure Your Credentials

3. Use SSH Keys for Git

4. Restrict File Permissions

5. Use a Dedicated User

Troubleshooting

Cron Job Not Running

Bruin Command Not Found in Cron

Permission Denied Errors

Connection Issues

Pipeline Fails in Cron but Works Manually

Example: Complete Production Setup

Directory Structure

.bruin.yml (in project root)

/etc/docs/bruin/credentials.env

Crontab

run-analytics.sh

Next Steps

Additional Resources