Microsoft OneLake
OneLake is the unified, lake-centric storage layer of Microsoft Fabric. It exposes an ADLS Gen2-compatible endpoint at onelake.dfs.fabric.microsoft.com, where each workspace acts as a container and items (Lakehouses, Warehouses, …) live underneath it.
ingestr supports OneLake as a destination. It can write to either area of a Lakehouse:
- Tables — written as a Delta Lake table (Parquet data files plus a
_delta_logtransaction log) so the table is immediately queryable in Fabric, the SQL analytics endpoint, and Spark. - Files — written as raw Parquet files into the Lakehouse
Filesarea.
URI format
onelake://<workspace>/<lakehouse>?tenant_id=<tenant_id>&client_id=<client_id>&client_secret=<client_secret>URI parameters:
workspace: the Fabric workspace name or GUID (the URI host)lakehouse: the Lakehouse name or GUID (the URI path). A.Lakehouseitem suffix is added automatically; pass an explicit suffix (e.g.mywh.Warehouse) to target a different item type.tenant_id,client_id,client_secret(optional): Microsoft Entra service principal credentialssas_token(optional): a SAS token issued for OneLakelayout(optional): file-name template for Files mode (default{load_id}.{file_id}.{ext}); supports{table_name},{load_id},{file_id}and{ext}
The mode and table come from --dest-table, mirroring OneLake's path layout:
Tables/<name>(orTables/<schema>/<name>) → a Delta tableFiles/<path>→ raw Parquet files- a bare name with no prefix defaults to
Tables/
The final object path is: https://onelake.dfs.fabric.microsoft.com/<workspace>/<lakehouse>.Lakehouse/<Tables|Files>/<rest>
Authentication
OneLake only supports Microsoft Entra ID authentication — shared account keys are not accepted. ingestr resolves credentials in this order:
- SAS token — if
sas_tokenis provided. - Service principal — if
tenant_id,client_idandclient_secretare all provided (viaClientSecretCredential). The service principal needs Contributor (or item-level) access to the workspace, and your Fabric admin must allow service principals to use the APIs. - DefaultAzureCredential — otherwise, ingestr falls back to
DefaultAzureCredential, picking up environment variables, a managed identity, or your Azure CLI login.
Examples
Load a table into a Lakehouse as a queryable Delta table:
ingestr ingest \
--source-uri "postgres://user:pass@host:5432/db" \
--source-table "public.users" \
--dest-uri "onelake://myworkspace/mylakehouse?tenant_id=$TENANT_ID&client_id=$CLIENT_ID&client_secret=$CLIENT_SECRET" \
--dest-table "Tables/users"Write raw Parquet files into the Lakehouse Files area:
ingestr ingest \
--source-uri "postgres://user:pass@host:5432/db" \
--source-table "public.users" \
--dest-uri "onelake://myworkspace/mylakehouse?sas_token=$SAS_TOKEN" \
--dest-table "Files/exports/users"Append new rows to an existing Delta table (adds a new Delta commit):
ingestr ingest \
--source-uri "postgres://user:pass@host:5432/db" \
--source-table "public.events" \
--dest-uri "onelake://myworkspace/mylakehouse?tenant_id=$TENANT_ID&client_id=$CLIENT_ID&client_secret=$CLIENT_SECRET" \
--dest-table "Tables/events" \
--incremental-strategy appendIncremental strategies
For Tables (Delta) mode, ingestr supports replace, append, merge, delete+insert and scd2:
| Strategy | Behaviour |
|---|---|
replace | Clears the table directory and writes a fresh Delta commit (version 0). |
append | Reads the current Delta version and writes the next commit with the new rows. |
merge | Upsert by primary_key: existing rows with a matching key are replaced by the incoming rows. |
delete+insert | Deletes target rows whose incremental_key falls in the loaded interval, then inserts the new rows. |
scd2 | Slowly-changing-dimension type 2: maintains _scd_valid_from/_scd_valid_to/_scd_is_current, closing changed rows and inserting new versions. |
The Files mode only supports replace and append.
merge, delete+insert and scd2 are copy-on-write: because a Delta table has no SQL engine here, ingestr reads the current table back into memory, applies the operation, and rewrites it as a new Delta version. This means each run reads and rewrites the full table — suitable for small-to-medium tables.
Example (merge):
ingestr ingest \
--source-uri "postgres://user:pass@host:5432/db" \
--source-table "public.users" \
--dest-uri "onelake://myworkspace/mylakehouse?tenant_id=$TENANT_ID&client_id=$CLIENT_ID&client_secret=$CLIENT_SECRET" \
--dest-table "Tables/users" \
--incremental-strategy merge \
--primary-key idNotes & limitations
- Replace is not atomic — there is a brief window where the table is empty.
- Copy-on-write strategies load the entire target table into memory and rewrite it on every run.
- Partitioning: Delta tables are written non-partitioned;
partition_byis ignored in Tables mode for now. - Type mapping: timestamps are stored as Delta
timestamp(microseconds, UTC); JSON and UUID columns are stored asstring;TIMEcolumns are carried as microsecondlongvalues (Delta has no time type). - CDC-aware merge (soft-deletes via
_cdc_deleted) is not implemented; CDC delete markers are merged as regular rows.