Deploying Bruin on Ubuntu VMs
Managed Option Available
Looking for a fully managed solution? Bruin Cloud provides managed orchestration, monitoring, and scheduling without the operational overhead. Try it free!
This guide walks you through deploying Bruin on Ubuntu-based virtual machines (AWS EC2, Google Cloud Compute Engine, DigitalOcean Droplets, or any Ubuntu server) and scheduling pipeline runs using cron jobs.
Prerequisites
Before you begin, ensure you have:
- An Ubuntu server (18.04 or later recommended)
- SSH access to the server with sudo privileges
- Git installed on the server
- A Bruin project ready to deploy
Step 1: Connect to Your Server
Connect to your Ubuntu VM via SSH:
ssh username@your-server-ipReplace username with your actual username and your-server-ip with your server's IP address or hostname.
Step 2: Update System Packages
Always start by updating your system packages:
sudo apt update && sudo apt upgrade -yStep 3: Install Git (if not already installed)
Git is required to clone your Bruin projects:
sudo apt install git -yVerify the installation:
git --versionStep 4: Install Bruin CLI
Install Bruin using the official installation script:
curl -LsSf https://getbruin.com/install/cli | shAlternatively, you can use wget:
wget -qO- https://getbruin.com/install/cli | shThe installer will automatically add Bruin to your PATH. You may need to restart your shell or run:
source ~/.bashrc # or ~/.zshrc if using zshVerify the installation:
bruin --versionStep 5: Clone Your Bruin Project
Clone your Bruin project repository to your server:
cd ~
git clone https://github.com/your-username/your-bruin-project.git
cd your-bruin-projectReplace the URL with your actual repository URL.
Step 6: Configure Credentials
Bruin needs access to your data platforms. Set up your credentials in the .bruin.yml file in your project root.
Best Practice: Use Environment Variables
Instead of storing sensitive credentials as plain text in your configuration files, use environment variables. This approach is more secure and makes it easier to manage secrets across different environments.
Create or edit the .bruin.yml file:
nano .bruin.ymlOption A: Using Environment Variables (Recommended)
Use the ${VAR_NAME} syntax to reference environment variables in your configuration:
environments:
production:
connections:
google_cloud_platform:
- name: "my_gcp"
service_account_json: ${GCP_SERVICE_ACCOUNT_JSON}
project_id: ${GCP_PROJECT_ID}
postgres:
- name: "my_postgres"
username: ${POSTGRES_USERNAME}
password: ${POSTGRES_PASSWORD}
host: ${POSTGRES_HOST}
port: ${POSTGRES_PORT}
database: ${POSTGRES_DATABASE}Environment variables are expanded at runtime, keeping your .bruin.yml file free of sensitive data.
Setting Up Environment Variables
Create a secure directory and environment file to store your credentials:
sudo mkdir -p /etc/bruin
sudo nano /etc/docs/bruin/credentials.envAdd your credentials:
# Google Cloud Platform
GCP_PROJECT_ID=my-project-id
GCP_SERVICE_ACCOUNT_JSON='{"type":"service_account","project_id":"my-project-id","private_key_id":"...","private_key":"-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----\n","client_email":"...","client_id":"...","auth_uri":"...","token_uri":"...","auth_provider_x509_cert_url":"...","client_x509_cert_url":"..."}'
# PostgreSQL
POSTGRES_USERNAME=postgres_user
POSTGRES_PASSWORD=your_secure_password
POSTGRES_HOST=localhost
POSTGRES_PORT=5432
POSTGRES_DATABASE=mydbSecure the file:
sudo chmod 600 /etc/docs/bruin/credentials.env
sudo chown $(whoami):$(whoami) /etc/docs/bruin/credentials.envLoading Environment Variables
For interactive sessions, add to your shell profile:
echo 'set -a; source /etc/docs/bruin/credentials.env; set +a' >> ~/.bashrc
source ~/.bashrcFor cron jobs, source the file in your wrapper script (see Step 12).
Option B: Using Service Account Files
If you prefer to use service account files instead of inline JSON:
environments:
production:
connections:
google_cloud_platform:
- name: "my_gcp"
service_account_file: "/home/username/.config/gcloud/service-account.json"
project_id: "my-project-id"
postgres:
- name: "my_postgres"
username: "postgres_user"
password: "your_password"
host: "localhost"
port: 5432
database: "mydb"Storing Service Account Files
If you're using service account files (e.g., for Google Cloud):
mkdir -p ~/.config/gcloud
nano ~/.config/gcloud/service-account.jsonPaste your service account JSON content, save, and secure the file:
chmod 600 ~/.config/gcloud/service-account.jsonWARNING
When using service account files, ensure the files are properly secured and never committed to version control.
Step 7: Test Your Pipeline
Before setting up automation, test that your pipeline runs successfully:
cd ~/your-bruin-project
bruin run .If you want to run a specific pipeline:
bruin run pipelines/my_pipelineCheck for any errors and resolve them before proceeding.
Step 8: Set Up Cron Jobs
Cron is a time-based job scheduler in Unix-like operating systems. You'll use it to run your Bruin pipelines automatically.
Understanding Cron Syntax
Cron uses the following format:
* * * * * command-to-execute
│ │ │ │ │
│ │ │ │ └─── Day of week (0-7, Sunday = 0 or 7)
│ │ │ └───── Month (1-12)
│ │ └─────── Day of month (1-31)
│ └───────── Hour (0-23)
└─────────── Minute (0-59)Examples:
0 * * * *- Every hour at minute 00 9 * * *- Every day at 9:00 AM*/15 * * * *- Every 15 minutes0 2 * * 1- Every Monday at 2:00 AM0 0 1 * *- First day of every month at midnight
Create a Cron Job
Open your crontab file:
crontab -eIf this is your first time, you'll be asked to choose an editor. Select nano (option 1) for simplicity.
Add a cron job to run your pipeline. Here's an example that runs daily at 3:00 AM:
0 3 * * * /home/username/.local/bin/bruin run /home/username/your-bruin-project >> /home/username/logs/bruin.log 2>&1Important notes:
- Use absolute paths for both the Bruin executable and your project directory
- Replace
usernamewith your actual username - The
>> /home/username/logs/bruin.log 2>&1redirects output to a log file
Multiple Pipelines with Different Schedules
You can schedule different pipelines at different times:
# Run data ingestion pipeline every hour
0 * * * * /home/username/.local/bin/bruin run /home/username/your-bruin-project/pipelines/ingestion >> /home/username/logs/ingestion.log 2>&1
# Run analytics pipeline daily at 6 AM
0 6 * * * /home/username/.local/bin/bruin run /home/username/your-bruin-project/pipelines/analytics >> /home/username/logs/analytics.log 2>&1
# Run weekly report every Monday at 8 AM
0 8 * * 1 /home/username/.local/bin/bruin run /home/username/your-bruin-project/pipelines/weekly_report >> /home/username/logs/weekly.log 2>&1Step 9: Set Up Logging
Create a directory for logs:
mkdir -p ~/logsYour cron jobs will now write outputs to log files in this directory.
View Logs
Check recent logs:
tail -f ~/logs/bruin.logView last 100 lines:
tail -n 100 ~/logs/bruin.logSearch for errors:
grep -i error ~/logs/bruin.logLog Rotation
To prevent log files from growing too large, set up log rotation:
sudo nano /etc/logrotate.d/bruinAdd the following configuration:
/home/username/logs/*.log {
daily
missingok
rotate 14
compress
notifempty
create 0644 username username
}This configuration:
- Rotates logs daily
- Keeps 14 days of logs
- Compresses old logs
- Creates new log files with proper permissions
Step 10: Set Up Environment-Specific Configurations
Use Bruin's environment feature to manage different configurations:
bruin run . --environment productionUpdate your cron job to use the production environment:
0 3 * * * /home/username/.local/bin/bruin run /home/username/your-bruin-project --environment production >> /home/username/logs/bruin.log 2>&1Step 11: Monitoring and Alerting
Email Notifications on Failure
Cron can send emails when jobs fail. First, install a mail utility:
sudo apt install mailutils -yConfigure postfix when prompted (select "Internet Site").
Create a wrapper script to handle errors:
nano ~/scripts/run-bruin.shAdd the following:
#!/bin/bash
LOG_FILE="/home/username/logs/bruin.log"
PROJECT_PATH="/home/username/your-bruin-project"
BRUIN_BIN="/home/username/.local/bin/bruin"
echo "=== Starting Bruin run at $(date) ===" >> "$LOG_FILE"
if ! $BRUIN_BIN run "$PROJECT_PATH" --environment production >> "$LOG_FILE" 2>&1; then
echo "Bruin pipeline failed at $(date)" | mail -s "Bruin Pipeline Failed" your-email@example.com
exit 1
fi
echo "=== Completed successfully at $(date) ===" >> "$LOG_FILE"Make it executable:
chmod +x ~/scripts/run-bruin.shUpdate your crontab:
0 3 * * * /home/username/scripts/run-bruin.shStep 12: Automatic Updates
Keep your Bruin project up to date by pulling changes before each run:
Update your wrapper script:
#!/bin/bash
set -e
export PATH="/home/username/.local/bin:$PATH"
export HOME="/home/username"
# Load credentials from secure environment file
if [ -f /etc/docs/bruin/credentials.env ]; then
set -a
source /etc/docs/bruin/credentials.env
set +a
fi
LOG_FILE="/home/username/logs/bruin.log"
PROJECT_PATH="/home/username/your-bruin-project"
BRUIN_BIN="/home/username/.local/bin/bruin"
echo "=== Starting Bruin run at $(date) ===" >> "$LOG_FILE"
# Pull latest changes
cd "$PROJECT_PATH"
git pull origin main >> "$LOG_FILE" 2>&1
# Run the pipeline
if ! $BRUIN_BIN run "$PROJECT_PATH" --environment production >> "$LOG_FILE" 2>&1; then
echo "Bruin pipeline failed at $(date)" | mail -s "Bruin Pipeline Failed" your-email@example.com
exit 1
fi
echo "=== Completed successfully at $(date) ===" >> "$LOG_FILE"The set -a and set +a commands enable and disable automatic export of all variables, ensuring that all credentials from the environment file are available to Bruin.
Security Best Practices
1. Use Environment Variables for Credentials
Instead of hardcoding credentials in .bruin.yml, use environment variables:
# Good: Using environment variables
postgres:
- name: "my_postgres"
username: ${POSTGRES_USERNAME}
password: ${POSTGRES_PASSWORD}
# Avoid: Hardcoded credentials
postgres:
- name: "my_postgres"
username: "admin"
password: "plaintext_password"2. Secure Your Credentials
Never commit credentials to Git:
echo ".bruin.yml" >> .gitignore
echo "*.json" >> .gitignore
echo "credentials.env" >> .gitignoreStore credentials in a secure location with restricted permissions:
sudo mkdir -p /etc/bruin
sudo chmod 700 /etc/bruin3. Use SSH Keys for Git
Set up SSH keys for passwordless Git operations:
ssh-keygen -t ed25519 -C "your-email@example.com"
cat ~/.ssh/id_ed25519.pubAdd the public key to your Git provider (GitHub, GitLab, etc.).
4. Restrict File Permissions
chmod 600 ~/.bruin.yml
chmod 600 ~/.config/gcloud/*.json
sudo chmod 600 /etc/docs/bruin/credentials.env5. Use a Dedicated User
Create a dedicated user for running Bruin:
sudo useradd -m -s /bin/bash bruin
sudo su - bruinThen follow all the installation steps as the bruin user.
Troubleshooting
Cron Job Not Running
- Check if cron service is running:
sudo systemctl status cron- Check cron logs:
grep CRON /var/log/syslog- Verify your crontab:
crontab -lBruin Command Not Found in Cron
Cron has a limited environment. Always use absolute paths:
# Find the full path to bruin
which bruin
# Use the full path in crontab
0 3 * * * /home/username/.local/bin/bruin run /home/username/your-bruin-projectPermission Denied Errors
Ensure your user has permission to access all files:
chmod +x ~/.local/bin/bruin
chmod -R 755 ~/your-bruin-projectConnection Issues
Test your connections:
bruin connections --environment productionPipeline Fails in Cron but Works Manually
This often happens due to environment differences. Export all necessary environment variables in your wrapper script:
#!/bin/bash
export PATH="/home/username/.local/bin:$PATH"
export HOME="/home/username"
# Add other environment variables here
# Run your pipeline
/home/username/.local/bin/bruin run /home/username/your-bruin-projectExample: Complete Production Setup
Here's a complete example for a production deployment using environment variable injection:
Directory Structure
/home/docs/bruin/
├── projects/
│ └── analytics-pipeline/
│ └── .bruin.yml
├── scripts/
│ ├── run-ingestion.sh
│ └── run-analytics.sh
└── logs/
├── ingestion.log
└── analytics.log
/etc/docs/bruin/
└── credentials.env.bruin.yml (in project root)
default_environment: production
environments:
production:
connections:
google_cloud_platform:
- name: "gcp-prod"
service_account_json: ${GCP_SERVICE_ACCOUNT_JSON}
project_id: ${GCP_PROJECT_ID}
postgres:
- name: "postgres-analytics"
username: ${POSTGRES_USERNAME}
password: ${POSTGRES_PASSWORD}
host: ${POSTGRES_HOST}
port: ${POSTGRES_PORT}
database: ${POSTGRES_DATABASE}
snowflake:
- name: "snowflake-prod"
account: ${SNOWFLAKE_ACCOUNT}
username: ${SNOWFLAKE_USERNAME}
password: ${SNOWFLAKE_PASSWORD}
database: ${SNOWFLAKE_DATABASE}
warehouse: ${SNOWFLAKE_WAREHOUSE}/etc/docs/bruin/credentials.env
# Google Cloud Platform
GCP_PROJECT_ID=my-analytics-project
GCP_SERVICE_ACCOUNT_JSON='{"type":"service_account","project_id":"my-analytics-project",...}'
# PostgreSQL
POSTGRES_USERNAME=analytics_user
POSTGRES_PASSWORD=super_secure_password
POSTGRES_HOST=db.company.com
POSTGRES_PORT=5432
POSTGRES_DATABASE=analytics
# Snowflake
SNOWFLAKE_ACCOUNT=ABC12345
SNOWFLAKE_USERNAME=bruin_user
SNOWFLAKE_PASSWORD=snowflake_secure_password
SNOWFLAKE_DATABASE=ANALYTICS
SNOWFLAKE_WAREHOUSE=COMPUTE_WHCrontab
# Pull and run ingestion pipeline every 6 hours
0 */6 * * * /home/docs/bruin/scripts/run-ingestion.sh
# Run analytics pipeline daily at 2 AM
0 2 * * * /home/docs/bruin/scripts/run-analytics.shrun-analytics.sh
#!/bin/bash
set -e
export PATH="/home/docs/bruin/.local/bin:$PATH"
export HOME="/home/bruin"
# Load credentials from secure environment file
if [ -f /etc/docs/bruin/credentials.env ]; then
set -a
source /etc/docs/bruin/credentials.env
set +a
fi
LOG_FILE="/home/docs/bruin/logs/analytics.log"
PROJECT_PATH="/home/docs/bruin/projects/analytics-pipeline"
echo "=== Starting analytics run at $(date) ===" >> "$LOG_FILE"
cd "$PROJECT_PATH"
git pull origin main >> "$LOG_FILE" 2>&1
if ! bruin run . --environment production >> "$LOG_FILE" 2>&1; then
echo "Analytics pipeline failed at $(date)" | mail -s "Alert: Analytics Pipeline Failed" admin@company.com
exit 1
fi
echo "=== Completed successfully at $(date) ===" >> "$LOG_FILE"Next Steps
- Explore Bruin Cloud for managed orchestration and monitoring
- Set up CI/CD integration for automated testing
- Learn about quality checks to ensure data quality
- Review best practices for pipeline design