Skip to content

Backfills

A backfill in Bruin Cloud is a time-bounded re-processing of a pipeline across a historical date range. Instead of one run, a backfill splits a date range into many runs — typically one per scheduled interval — and tracks them as a single unit. Each child run uses the same interval semantics as a CLI run, so built-in variables like start_timestamp and end_timestamp resolve to that child's window.

Use a backfill when you need to:

  • Process historical data after adding a new pipeline.
  • Reprocess after fixing a bug in upstream data.
  • Catch up after a maintenance window.

Create a backfill

Backfills are created from the New run modal on a pipeline's page. See New run for the full walkthrough. The two split modes:

  • Auto-split by schedule — one run per scheduled interval. A 23-day range on a daily schedule produces 23 runs.
  • Manual split — pick the chunk size and unit yourself. Useful when daily granularity is too fine or too coarse.

Splits are capped below 250 jobs.

The backfill detail page

Once a backfill is created, open it from the Backfills tab on the pipeline page.

The header summarises the backfill at a glance:

  • Status — running, success, failed, queued, or unknown (when child runs haven't been scheduled yet).
  • Run count with a failed-runs breakdown.
  • Date range(s) covered.
  • Created at timestamp.

Progress bar

A visual timeline of every run across the interval, colour-coded for success, failure, and unknown. Hover a segment to see the interval it covers and jump to that run.

Runs table

Paginated list of every child run (30 per page, with auto-load on first visit). Columns:

  • Run ID — links to the run detail.
  • Interval start / end — the data window for that specific run.
  • Duration
  • Status

Summary sidebar

Reiterates the headline stats — status, total runs, creation date — and shows a per-status breakdown (count and percentage) for quick triage.

Actions

Mark all runs as

Bulk-flip the status of every run in the backfill to success or failed. Useful when:

  • You've confirmed the data is fine despite a misleading failure status.
  • You want to invalidate the whole range and trigger a rerun upstream.

Bruin polls every 3 seconds while the mark operation propagates.

Exclude assets across all runs

Select one or more assets in the pipeline and mark them as skipped in every child run. The selected assets are treated as already-succeeded so they don't actually execute. Useful when a single expensive asset doesn't need backfilling but the rest do.

Per-run actions

For retries, cancels, and per-asset reruns inside a single backfill run, open that run from the table and use the run detail controls.

Status states

StateMeaning
QueuedBackfill created, child runs not yet scheduled
RunningAt least one child run is in flight
SuccessAll child runs finished successfully
FailedOne or more child runs failed
UnknownChild runs not yet visible to the scheduler

Limits and behaviour

  • Up to 250 child runs per backfill.
  • Child runs honour the pipeline's normal asset dependencies — assets within each run still wait on their upstreams.
  • Cross-pipeline sensors apply: if an asset in the backfill depends on an external upstream via URI, it waits for that upstream interval to complete.
  • Marking the backfill as success or failed is recorded in the audit log.