R Assets

Bruin brings R statistical computing capabilities to your data pipelines:

Run R scripts with full access to R's powerful statistical and data analysis packages
Automatic dependency management with renv integration
Access to connection credentials and secrets via environment variables
Execute complex statistical computations alongside SQL and Python assets

R assets allow you to leverage R's extensive ecosystem for statistical analysis, machine learning, data visualization, and more within your Bruin pipelines.

"@bruin
name: statistical_analysis
type: r
depends:
    - raw_user_data
@bruin"

library(dplyr)

cat("Running R statistical analysis\n")

# Your R code here
results <- data.frame(
  metric = c("mean", "median", "sd"),
  value = c(42.5, 40.0, 5.2)
)

print(results)

Dependency Management

R assets support dependency management through renv, R's standard dependency management tool. Bruin searches for the closest renv.lock file in the file tree and automatically restores the environment with the specified packages.

For example, assume you have a file tree such as:

* folder1/
    * folder2/
        * analysis.r
        * renv.lock
    * folder3/
        * report.r
    * renv.lock
* folder4/
    * folder5/
        * folder6/
            * model.r
* renv.lock

When Bruin runs analysis.r, it will use folder1/folder2/renv.lock since they are in the same folder
For report.r, since there is no renv.lock in the same folder, Bruin goes up one level and finds folder1/renv.lock
Similarly, renv.lock in the main folder is used for model.r since none of folder6, folder5, or folder4 have any renv.lock files

Using renv

To create an renv.lock file for your R assets:

# In your R console, navigate to your asset directory
renv::init()               # Initialize renv for the project
renv::install("dplyr")     # Install packages you need
renv::install("ggplot2")
renv::snapshot()           # Create renv.lock file

Manual Dependency Management

If you don't use renv.lock, you can manage dependencies directly in your R script using install.packages():

"@bruin
name: manual_deps_example
type: r
@bruin"

# Check if package is installed, install if not
if (!require("jsonlite", quietly = TRUE)) {
  install.packages("jsonlite", repos = "https://cloud.r-project.org")
}

library(jsonlite)

# Your code here

Asset Definition

R assets use a multiline string with @bruin markers to define metadata in YAML format. This is similar to Python's approach but uses R's native string syntax:

"@bruin
name: asset_name
type: r
depends:
    - upstream_asset1
    - upstream_asset2

secrets:
    - key: MY_SECRET
      inject_as: R_SECRET
@bruin"

# Your R code starts here
cat("Hello from R!\n")

The configuration block must:

Start with "@bruin on its own line (can also use single quotes '@bruin)
End with @bruin" on its own line (matching quote type)
Contain valid YAML configuration between the markers
Preserve proper YAML indentation

All standard asset parameters are supported. See the SQL asset documentation for a complete list of available configuration options including:

Dependencies (depends)
Secrets and connections (secrets)
Parameters (parameters)
Columns and quality checks (columns)
Custom checks (custom_checks)
And more

Secrets and Connections

Secrets and connections are injected as environment variables in JSON format. See the secrets documentation for more details on how to define and use secrets.

"@bruin
name: r_with_secrets
secrets:
    - key: postgres_connection
      inject_as: DB_CONN
@bruin"

library(jsonlite)

# Access the secret from environment variable
connection_json <- Sys.getenv("DB_CONN")
conn_details <- fromJSON(connection_json)

# Use connection details
cat(sprintf("Connecting to: %s\n", conn_details$host))

Environment Variables

Bruin introduces a set of environment variables by default to every R asset.

Builtin

The following environment variables are available in every R asset execution:

Environment Variable	Description
`BRUIN_START_DATE`	The start date of the pipeline run in `YYYY-MM-DD` format (e.g. `2024-01-15`)
`BRUIN_START_DATETIME`	The start date and time of the pipeline run in `YYYY-MM-DDThh:mm:ss` format (e.g. `2024-01-15T13:45:30`)
`BRUIN_START_TIMESTAMP`	The start timestamp of the pipeline run in RFC3339 format with timezone (e.g. `2024-01-15T13:45:30.000000Z07:00`)
`BRUIN_END_DATE`	The end date of the pipeline run in `YYYY-MM-DD` format (e.g. `2024-01-15`)
`BRUIN_END_DATETIME`	The end date and time of the pipeline run in `YYYY-MM-DDThh:mm:ss` format (e.g. `2024-01-15T13:45:30`)
`BRUIN_END_TIMESTAMP`	The end timestamp of the pipeline run in RFC3339 format with timezone (e.g. `2024-01-15T13:45:30.000000Z07:00`)
`BRUIN_RUN_ID`	The unique identifier for the pipeline run
`BRUIN_PIPELINE`	The name of the pipeline being executed
`BRUIN_FULL_REFRESH`	Set to `1` when the pipeline is running with the `--full-refresh` flag, empty otherwise
`BRUIN_THIS`	The name of the R asset
`BRUIN_ASSET`	The name of the R asset (same as BRUIN_THIS)

Pipeline

Bruin supports user-defined variables at a pipeline level. These become available as a JSON document in your R asset as BRUIN_VARS. When no variables exist, BRUIN_VARS is set to {}. See pipeline variables for more information on how to define and override them.

Here's an example:

"@bruin
name: r_with_variables
@bruin"

library(jsonlite)

# Access pipeline variables
vars_json <- Sys.getenv("BRUIN_VARS")
vars <- fromJSON(vars_json)

cat(sprintf("Environment: %s\n", vars$environment))
cat(sprintf("Region: %s\n", vars$region))

Examples

Basic R Script

The simplest R asset with no dependencies:

"@bruin
name: hello_r
type: r
@bruin"

cat("Hello from R!\n")
result <- 2 + 2
cat(sprintf("2 + 2 = %d\n", result))

Statistical Analysis with Dependencies

Using R packages for statistical analysis:

"@bruin
name: statistical_summary
depends:
    - raw_data
@bruin"

library(dplyr)

cat("Performing statistical analysis\n")

# Generate sample data
data <- data.frame(
  value = rnorm(1000, mean = 50, sd = 10),
  category = sample(c("A", "B", "C"), 1000, replace = TRUE)
)

# Statistical summary
summary_stats <- data %>%
  group_by(category) %>%
  summarise(
    mean = mean(value),
    median = median(value),
    sd = sd(value),
    min = min(value),
    max = max(value)
  )

print(summary_stats)

Working with Database Connections

Accessing database credentials via environment variables:

"@bruin
name: db_analysis
secrets:
    - key: postgres-default
      inject_as: PG_CONN
@bruin"

library(jsonlite)
library(DBI)
library(RPostgres)

# Parse connection details
conn_json <- Sys.getenv("PG_CONN")
conn_details <- fromJSON(conn_json)

# Connect to database
con <- dbConnect(
  RPostgres::Postgres(),
  host = conn_details$host,
  port = conn_details$port,
  dbname = conn_details$database,
  user = conn_details$username,
  password = conn_details$password
)

# Query data
result <- dbGetQuery(con, "SELECT COUNT(*) FROM users")
cat(sprintf("Total users: %d\n", result$count))

# Clean up
dbDisconnect(con)

Time-Series Analysis

Using R's extensive time-series capabilities:

"@bruin
name: time_series_forecast
depends:
    - historical_data
@bruin"

library(forecast)

cat("Running time-series forecast\n")

# Example time series
ts_data <- ts(rnorm(100), frequency = 12, start = c(2020, 1))

# Fit ARIMA model
model <- auto.arima(ts_data)

# Forecast next 12 periods
forecast_result <- forecast(model, h = 12)

cat("Forecast complete!\n")
print(summary(forecast_result))

Installation

R assets require R to be installed on your system. Install R using one of these methods:

macOS: brew install r
Ubuntu/Debian: sudo apt-get install r-base
Windows: Download from CRAN
Other platforms: See CRAN installation guides

To verify R is installed correctly:

bash

R --version

Advanced Configuration Example

Here's an example showing a more complex configuration:

"@bruin
name: comprehensive_analysis
type: r
depends:
    - raw_user_data
    - product_catalog

secrets:
    - key: postgres-analytics
      inject_as: DB_CONN

columns:
    - name: user_id
      type: integer
      checks:
          - name: not_null
          - name: unique
@bruin"

library(dplyr)
library(jsonlite)

# Access environment variables
db_json <- Sys.getenv("DB_CONN")
db <- fromJSON(db_json)

cat("Running comprehensive analysis\n")
# Your analysis code here

Best Practices

Use renv for reproducibility: Create an renv.lock file to ensure consistent package versions across environments
Use the string-based multiline format: The multiline string format (using "@bruin ... @bruin") makes complex configurations with dependencies, secrets, and parameters much easier to read and maintain
Quote choice: Use double quotes " or single quotes ' - both work, just ensure the opening and closing quotes match
Handle errors gracefully: Use R's error handling (tryCatch) to provide clear error messages
Log progress: Use cat() or print() statements to provide visibility into your R script's execution
Clean up resources: Always close database connections and file handles when done
Test locally: Run your R scripts locally before integrating them into pipelines

Introduction

Features

Templates

VS Code Extension

Panels Overview

Side Panel

Dashboard

Jinja Templating

R Assets

Dependency Management

Using renv

Manual Dependency Management

Asset Definition

Secrets and Connections

Environment Variables

Builtin

Pipeline

Examples

Basic R Script

Statistical Analysis with Dependencies

Working with Database Connections

Time-Series Analysis

Installation

Advanced Configuration Example

Best Practices

Panels Overview

Side Panel

Dashboard

R Assets ​

Dependency Management ​

Using renv ​

Manual Dependency Management ​

Asset Definition ​

Secrets and Connections ​

Environment Variables ​

Builtin ​

Pipeline ​

Examples ​

Basic R Script ​

Statistical Analysis with Dependencies ​

Working with Database Connections ​

Time-Series Analysis ​

Installation ​

Advanced Configuration Example ​

Best Practices ​

R Assets

Dependency Management

Using renv

Manual Dependency Management

Asset Definition

Secrets and Connections

Environment Variables

Builtin

Pipeline

Examples

Basic R Script

Statistical Analysis with Dependencies

Working with Database Connections

Time-Series Analysis

Installation

Advanced Configuration Example

Best Practices