MongoDB Oplog
The MongoDB Oplog origin reads entries from MongoDB Oplog. For information about supported versions, see Supported Systems and Versions.
MongoDB stores information about changes to the database in a local capped collection called an Oplog. The Oplog contains information about changes in data as well as changes in the database. The MongoDB Oplog origin can read any operation written to the Oplog.
Use the MongoDB Oplog origin to capture changes in data or the database. To read data from a MongoDB collection, use the MongoDB origin.To read from MongoDB Atlas, use the MongoDB Atlas origin.
The MongoDB Oplog origin includes the CRUD operation type in a record header attribute so generated records can be easily processed by CRUD-enabled destinations. For an overview of Data Collector changed data processing and a list of CRUD-enabled destinations, see Processing Changed Data.
When you configure the MongoDB Oplog origin, you configure connection information, such as the connection string and MongoDB credentials. You can also use a connection to configure the origin. You define an optional timestamp and ordinal to specify where to start the read, the operations that you want to process, and the read preference.
You can optionally configure advanced options that determine how the origin connects to MongoDB, including enabling SSL/TLS.
When a pipeline stops, the MongoDB Oplog origin notes where it stops reading. When the pipeline starts again, the origin continues processing from where it stopped by default. You can reset the origin to process all requested data.
Credentials
Based on the authentication used by the MongoDB server, configure the stage to use no authentication, username/password authentication, or LDAP authentication. When using username/password authentication, you can also use delegated authentication. When using LDAP authentication, you can use server-driven authentication or plain authentication.
By default, the origin uses no authentication.
- Connection string
- Enter credentials in the connection string on the MongoDB tab.
- Credentials tab
- Select either the Username/Password or LDAP authentication type on the Credentials tab. When using LDAP authentication, you also choose between server-driven or plain authentication.
Oplog Timestamp and Ordinal
When you start the pipeline, the MongoDB Oplog origin starts reading from the beginning of the Oplog by default. You can configure a timestamp and ordinal to specify where you want to start the processing.
"ts": Timestamp(<timestamp>, <ordinal>)
The timestamp format is the seconds since the Unix epoch, such as 1412180887.
The ordinal is an integer counter used to differentiate between entries that occur in the same second.
You can use a timestamp and ordinal to specify where to begin reading from the Oplog. When you use a timestamp, you must also define an ordinal.
For more information about the Oplog timestamp field, see the MongoDB documentation.
Read Preference
You can configure the read preference that the MongoDB Oplog origin uses. The read preference determines how the origin reads data from different members of the MongoDB replica set.
- Primary - Requires reading from the primary member.
- Primary Preferred - Prefers reading from the primary, but allows reads from a secondary member.
- Secondary - Requires reading from a secondary member.
- Secondary Preferred - Prefers reading from a secondary, but allows reads from a primary when necessary.
- Nearest - Reads from the member with the least network latency.
By default, the origin uses Secondary Preferred to avoid making unnecessary requests to the primary member.
Generated Records
The MongoDB Oplog origin generates records based on data from the MongoDB Oplog and adds CRUD and CDC related record header attributes.
The structure of Oplog records is unique, so when necessary, you might use some processors in the pipeline to convert record structure.
For example, for insert records, record data resides in a map field named "o". But for an update record, the _id field is part of the o2 map field. To merge the record data, you can use a Field Flattener to flatten the map fields and a Field Remover to remove any unnecessary fields.
For more information about the Oplog record structure, see the MongoDB documentation. The following site is also a good resource: https://www.compose.com/articles/the-mongodb-oplog-and-node-js/.
CRUD Operation and CDC Header Attributes
The MongoDB Oplog origin includes the CRUD operation type in the sdc.operation.type record header attribute.
If you use a CRUD-enabled destination in the pipeline such as JDBC Producer or Elasticsearch, the destination can use the operation type when writing to destination systems. When necessary, you can use an Expression Evaluator processor or any scripting processor to manipulate the value in the header attribute. For an overview of Data Collector changed data processing and a list of CRUD-enabled destinations, see Processing Changed Data.
- 1 for INSERT
- 2 for DELETE
- 3 for UPDATE
- 5 for unsupported operations, such as CMD, NOOP, or DB, which are available MongoDB operation types but not applicable to record data.
- op - The CRUD operation using the following values:
- i for INSERT
- u for UPDATE
- d for DELETE
- ns - The namespace, using the following format:
<database>:<tablename>
.
Enabling SSL/TLS
You can enable the MongoDB Oplog origin to use SSL/TLS to connect to MongoDB.
Configuring a MongoDB Oplog Origin
Configure a MongoDB Oplog origin to read data from a MongoDB Oplog.