MongoDB
The MongoDB origin reads data from MongoDB. For information about supported versions, see Supported Systems and Versions in the Data Collector documentation.
The MongoDB origin reads from MongoDB and generates a record for every MongoDB document. To read from MongoDB Atlas, use the MongoDB Atlas origin. To read change data capture information from the MongoDB Oplog, use the MongoDB Oplog origin.
The MongoDB origin reads from capped and uncapped collections. When you configure MongoDB, you define connection information, such as the connection string and MongoDB credentials. You can also use a connection to configure the origin. You configure the offset field, collection type, and initial offset. These properties determine how the origin queries the database.
When the pipeline stops, the MongoDB origin notes where it stops reading. When the pipeline starts again, the origin continues processing from the last-saved offset by default. You can reset the origin to process all requested files.
You can optionally configure advanced options that determine how the origin connects to MongoDB, including enabling SSL/TLS for the origin.
The origin can generate events for an event stream. For more information about dataflow triggers and the event framework, see Dataflow Triggers Overview.
Credentials
Based on the authentication used by the MongoDB server, configure the stage to use no authentication, username/password authentication, or LDAP authentication. When using username/password authentication, you can also use delegated authentication. When using LDAP authentication, you can use server-driven authentication or plain authentication.
By default, the origin uses no authentication.
- Connection string
- Enter credentials in the connection string on the MongoDB tab.
- Credentials tab
- Select either the Username/Password or LDAP authentication type on the Credentials tab. When using LDAP authentication, you also choose between server-driven or plain authentication.
Offset Field and Initial Offset
MongoDB uses the offset field to track the data to read. By default, the MongoDB origin uses the _id field as the offset field.
You can use a nested offset field, such as o._id. Or, you can use any Object ID, date, or string field as the offset field. The results for using any field besides the default _id field are not guaranteed.
YYYY-MM-DD HH:mm:ss
When you use a string field, specify the initial string to use as the initial offset.
Read Preference
You can configure the read preference that the MongoDB origin uses. The read preference determines how the origin reads data from different members of the MongoDB replica set.
- Primary - Requires reading from the primary member.
- Primary Preferred - Prefers reading from the primary, but allows reads from a secondary member.
- Secondary - Requires reading from a secondary member.
- Secondary Preferred - Prefers reading from a secondary, but allows reads from a primary when necessary.
- Nearest - Reads from the member with the least network latency.
By default, the origin uses Secondary Preferred to avoid making unnecessary requests to the primary member.
Event Generation
The MongoDB origin can generate events when it completes processing all available data and the configured batch wait time has elapsed.
- With the Pipeline Finisher executor to
stop the pipeline and transition the pipeline to a Finished state when
the origin completes processing available data.
When you restart a pipeline stopped by the Pipeline Finisher executor, the origin continues processing from the last-saved offset unless you reset the origin.
For an example, see Stopping a Pipeline After Processing All Available Data.
- With a destination to store event information.
For an example, see Preserving an Audit Trail of Events.
For more information about dataflow triggers and the event framework, see Dataflow Triggers Overview.
Event Records
Record Header Attribute | Description |
---|---|
sdc.event.type | Event type. Uses the following event type:
|
sdc.event.version | Integer that indicates the version of the event record type. |
sdc.event.creation_timestamp | Epoch timestamp when the stage created the event. |
The MongoDB origin can generate the following event record:
- no-more-data
- The MongoDB origin generates a no-more-data event record when the origin completes processing all available records and the number of seconds configured for Max Batch Wait Time elapses without any new objects appearing to be processed.
BSON Timestamp
When processing data from MongoDB version 2.6 and later, the MongoDB origin supports the MongoDB BSON Timestamp data type.
<BSON Timestamp field name>:Timestamp(<timestamp>, <ordinal>)
The MongoDB origin converts the BSON Timestamp to a map as follows:
<BSON Timestamp field name>{MAP}:
Timestamp{DATETIME}:<UTC timestamp>
Ordinal{INTEGER}:<integer ordinal>
(1485449409, 1)
, is converted to the following Transaction
map field: "Transaction":{
"Timestamp":Jan 26, 2016 14:50:09PM
"Ordinal":1
}
Enabling SSL/TLS
You can enable the MongoDB origin to use SSL/TLS to connect to MongoDB.
Configuring a MongoDB Origin
Configure a MongoDB origin to read data from MongoDB.