MongoDB
MongoDB is a popular, open source NoSQL database known for its flexibility, scalability, and wide adoption in a variety of applications.
Bruin supports MongoDB as a source for Ingestr assets, and you can use it to ingest data from MongoDB into your data warehouse.
In order to set up MongoDB connection, you need to add a configuration item in the .bruin.yml file and in asset file.
Follow the steps below to correctly set up MongoDB as a data source and run ingestion.
Configuration
Step 1: Add a connection to .bruin.yml file
To connect to MongoDB, you need to add a configuration item to the connections section of the .bruin.yml file. This configuration must comply with the following schema:
connections:
mongo:
- name: "localMongo"
username: "testUser"
password: "testPass123"
host: "localhost"
port: 27017name: The name to identify this MongoDB connectionusername: The MongoDB username with access to the databasepassword: The password for the specified usernamehost: The host address of the MongoDB server, without themongodb://protocol or port (for example,localhostormongo.example.com)port: The port number the database server is listening on. Use27017for the MongoDB default port.database: Optional. If set, Bruin appends it to the connection URI path. For ingestr assets, the target database is usually provided insource_tableasdatabase.collection.
Bruin turns this configuration into a MongoDB URI in the following form:
mongodb://testUser:testPass123@localhost:27017If username is empty, Bruin omits the username and password from the URI. If your username or password contains special characters, Bruin URL-encodes them when it builds the URI.
CAUTION
Always set port for mongo connections. The current configuration loader does not apply the schema default when reading .bruin.yml, so omitting port will build a URI with port 0.
Step 2: Create an asset file for data ingestion
To ingest data from MongoDB, you need to create an asset configuration file. This file defines the data flow from the source to the destination. Create a YAML file (e.g., mongo_ingestion.yml) inside the assets folder and add the following content:
name: public.mongo
type: ingestr
connection: postgres
parameters:
source_connection: localMongo
source_table: 'users.details'
destination: postgresname: The name of the asset.type: Specifies the type of the asset. Set this to ingestr to use the ingestr data pipeline.connection: This is the destination connection, which defines where the data should be stored. For example:postgresindicates that the ingested data will be stored in a Postgres database.source_connection: The name of the MongoDB connection defined in .bruin.yml.source_table: The MongoDB collection to ingest, indatabase.collectionformat.
Step 3: Run asset to ingest data
bruin run assets/mongo_ingestion.ymlAs a result of this command, Bruin will ingest data from the given MongoDB table into your Postgres database.
TIP
Instead of writing one asset per collection by hand, you can scaffold them automatically with bruin import database --as-ingestr. It scans the MongoDB connection and generates a runnable ingestr asset for every collection, so you can replicate the whole database by running the generated pipeline:
bruin import database --connection localMongo --as-ingestr --destination duckdb ./my-pipelineQuerying
Beyond ingestion, you can run ad-hoc queries against a MongoDB connection with the query command and verify it with bruin connections test. Because MongoDB is not SQL, the query is a JSON object describing a find or aggregation against one collection:
bruin query --connection localMongo \
--query '{"collection":"users","filter":{"age":{"$gt":21}},"sort":{"age":-1},"limit":10}'See Querying MongoDB for the full envelope syntax.