IBM Support

How to delete records from MongoDB using MongoDB Atlas and MongoDB stages

How To


Summary

How to delete records from MongoDB using MongoDB Atlas and MongoDB stages

Objective

In this article, we aim to explain the stages and configurations required to be able to delete records from MongoDB.

Steps

Configuration:

Two stages are required to perform the data deletion:

  1. Expression evaluator: here we will set in the Header Attributes the action that we want to perform, a deletion: sdc.operation.type = 2

  2. MongoDB Atlas or MongoDB stage

Screenshot 2024-11-04 at 11.36.33.png

Results

The behavior for MongoDB and MongoDB Atlas is slightly different:

MongoDB stage behavior:

The stage will only delete records when all the fields are exactly the same as they are in the destination MongoDB but without the _id field (as the _id field is not supported for this stage and will not match the record we are aiming for). When _id is removed and multiple records of the database appear with a different _id but equal for the rest of the fields, the oldest record is deleted. This is because MongoDB works with filters, and SDC only expects to receive one record so it only gets the latest one. As a result of these points:

  • Record manipulation:

    • Editing a field with an expression evaluator or other stages will make that the record is not deleted in the endpoint.

    • Removing a field that is not _id from the record will make that the record is not deleted.

    • Adding a field to the record not present in the destination MongoDB will make that the record is not deleted.

  • _id removal implications: _id MongoDB default ID field should not be included in the record sent to the MongoDB destination stage. Therefore, if the records do not have another unique ID field introduced by the customer for all the records (MongoDB recommends the usage of customer-created unique ID beside their default created _id), more than one record may have the exact same fields and values, but with different _id. If this happens, the latest record with the same fields (but different _id) will be deleted. Hence, two of the following things will be required to have control over the deletions:

    • No records should be equal: Otherwise, we don't have control over which one we are deleting

    • A second unique ID (MongoDB recommends this) should be added besides the _id to ensure duplicate records are unique and we have control over which one we are deleting

 

MongoDB Atlas behavior:

MongoDB Atlas is the new stage introduced in 5.2. The behavior is very similar to MongoDB: Record manipulation constraints still apply but the field _id is accepted:

It allows for the _id field to remain there and work as expected giving us control when duplicates appear. However, the _id field must contain several field attribute expressions to work as expected:

  • Field: _id and its value

  • Field attributes

    • bsonType: “OBJECT_ID”

    • timestamp: this can be extracted from the field _id value

    • date: this can be extracted from the field _id value

Here, the _id field is not mandatory to delete records, but it can be used if desired. Still, it is recommended to use a user custom unique ID as mentioned for MongoDB

Document Location

Worldwide

[{"Line of Business":{"code":"LOB76","label":"Data Platform"},"Business Unit":{"code":"BU048","label":"IBM Software"},"Product":{"code":"SSM7CU","label":"IBM StreamSets Data Collector"},"ARM Category":[{"code":"","label":""}],"ARM Case Number":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Version(s)"}]

Document Information

Modified date:
16 March 2025

UID

ibm17186361