Skip to content

MongoDB

MongoDB is a popular, open source NoSQL database known for its flexibility, scalability, and wide adoption in a variety of applications.

Bruin supports MongoDB as a source for Ingestr assets, and you can use it to ingest data from MongoDB into your data warehouse.

In order to set up MongoDB connection, you need to add a configuration item in the .bruin.yml file and in asset file.

Follow the steps below to correctly set up MongoDB as a data source and run ingestion.

Configuration

Step 1: Add a connection to .bruin.yml file

To connect to MongoDB, you need to add a configuration item to the connections section of the .bruin.yml file. This configuration must comply with the following schema:

yaml
    connections:
      mongo:
        - name: "localMongo"
          username: "testUser"
          password: "testPass123"
          host: "localhost"
          port: 27017
  • name: The name to identify this MongoDB connection
  • username: The MongoDB username with access to the database
  • password: The password for the specified username
  • host: The host address of the MongoDB server, without the mongodb:// protocol or port (for example, localhost or mongo.example.com)
  • port: The port number the database server is listening on. Use 27017 for the MongoDB default port.
  • database: Optional. If set, Bruin appends it to the connection URI path. For ingestr assets, the target database is usually provided in source_table as database.collection.

Bruin turns this configuration into a MongoDB URI in the following form:

text
mongodb://testUser:testPass123@localhost:27017

If username is empty, Bruin omits the username and password from the URI. If your username or password contains special characters, Bruin URL-encodes them when it builds the URI.

CAUTION

Always set port for mongo connections. The current configuration loader does not apply the schema default when reading .bruin.yml, so omitting port will build a URI with port 0.

Step 2: Create an asset file for data ingestion

To ingest data from MongoDB, you need to create an asset configuration file. This file defines the data flow from the source to the destination. Create a YAML file (e.g., mongo_ingestion.yml) inside the assets folder and add the following content:

yaml
name: public.mongo
type: ingestr
connection: postgres

parameters:
  source_connection: localMongo
  source_table: 'users.details'

  destination: postgres
  • name: The name of the asset.
  • type: Specifies the type of the asset. Set this to ingestr to use the ingestr data pipeline.
  • connection: This is the destination connection, which defines where the data should be stored. For example: postgres indicates that the ingested data will be stored in a Postgres database.
  • source_connection: The name of the MongoDB connection defined in .bruin.yml.
  • source_table: The MongoDB collection to ingest, in database.collection format.

Step 3: Run asset to ingest data

bash
bruin run assets/mongo_ingestion.yml

As a result of this command, Bruin will ingest data from the given MongoDB table into your Postgres database.

TIP

Instead of writing one asset per collection by hand, you can scaffold them automatically with bruin import database --as-ingestr. It scans the MongoDB connection and generates a runnable ingestr asset for every collection, so you can replicate the whole database by running the generated pipeline:

bash
bruin import database --connection localMongo --as-ingestr --destination duckdb ./my-pipeline
image

Querying

Beyond ingestion, you can run ad-hoc queries against a MongoDB connection with the query command and verify it with bruin connections test. Because MongoDB is not SQL, the query is a JSON object describing a find or aggregation against one collection:

bash
bruin query --connection localMongo \
  --query '{"collection":"users","filter":{"age":{"$gt":21}},"sort":{"age":-1},"limit":10}'

See Querying MongoDB for the full envelope syntax.