October 7, 2021 By Daniel Mermelstein 5 min read

This post provides advice and recommendations of best practices when using IBM Cloud Databases for MongoDB Standard and Enterprise plans. 

We hope it will be useful for developers and administrators that are going to use IBM Cloud Databases for MongoDB. It summarises our experience of some questions our customers have when running MongoDB deployments on the IBM Cloud. This document assumes that the reader is familiar with MongoDB, as it will mainly deal with the differences between a self-hosted MongoDB cluster and one deployed and managed by IBM.

The recommendations are at the bottom of the blog post, but we highly recommend reading the whole post to understand the underlying context around our recommendations. We also recommend using our Getting-to-Production Checklist to make sure you are incorporating all best practices for adopting IBM Cloud Databases

IBM Cloud Databases for MongoDB

Databases for MongoDB deployments consist of a three-node replica set deployed in three different availability zones. There is one primary in one zone and two secondary members in two additional separate zones. Both secondaries can become the primary in an election.

These replica sets provide additional fault tolerance and High Availability. IBM offers a Standard Plan and an Enterprise Plan for MongoDB deployments. You can read more about the features of the different plans here.

Multi-zone deployment of MongoDB Cluster.

Best practice guidelines

Durability vs. accessibility

Like all other distributed databases, MongoDB presents application developers with architectural decisions that involve some trade-offs between data durability (how permanent a data write is) and data availability (how quickly your application can write and read data). The purpose of this document is to examine these trade-offs only in the context of the IBM Cloud Databases deployments of MongoDB.

Generally speaking, data is more durable (i.e., less likely to be lost) if you have more copies of it. In normal circumstances, all data will always end up in all nodes. But your application can decide how long it wants to wait for confirmation of writes before proceeding. This is called a “write concern.” If you set a “write concern” of 1 (w:1), then once the data is written to the first node (primary) the write is acknowledged and your application can proceed (see here for more information about write concerns).

In the case of the Databases for MongoDB deployment, where there is a three-node replica set, you could theoretically set a “write concern” of 3 (w:3). In that case, your application would have to wait for all nodes (primary and two secondaries) to acknowledge the write before proceeding. 

By doing this you can, in effect, get very close to an RPO (Recovery Point Objective) of 0, because it is highly unlikely that any data write will be lost (as you would have to suffer a disastrous simultaneous loss of three nodes that are physically located in three separate physical locations).

However, this is not a workable solution because that would risk your application writes being potentially blocked during expected maintenance events, as well as during unexpected failures on secondaries.

Similarly, when reading data, there are also trade-offs. You can read from the primary, which normally guarantees the latest data but puts more pressure on the primary (which is already handling all the writes). Or you can read from the secondaries, but you have a higher chance of getting stale data. Refer to the documentation to decide on the best read pattern for your application.

Node availability

Nodes can become unavailable for a number of reasons, including hardware failure or networking problems. Distributed systems are designed to deal with these failures. In the case of MongoDB, if the primary node becomes unavailable, then the remaining nodes enter an automatic “election process,” where they elect a new primary node and continue to operate. But this election process can take a number of seconds, during which writing to the database is blocked (see the docs for more information).

In addition, IBM performs regular security and feature updates to keep your database safe, compliant and exciting to use. Upgrades will not cause write or read blocks if you follow the recommendation below regarding write concerns. You can learn more about these in our Application High Availability section. 


Based on the above, we recommend the following:

  1. Application design: There may be cases where connection exceptions occur (as described above). We recommend you catch these within your application and execute a reconnect and reissue cycle. In other words, design your applications with retry/reconnect logic. Expect and handle occasional write failures.  
  2. Durability (1): Your application write commands should all apply a write concern of majority. This means that the primary and one secondary will have to acknowledge the write before it is deemed successful. This is a reasonable compromise between durability and preventing your application being blocked in cases of node unavailability. Note that the write concern default on an IBM Cloud MongoDB deployment is currently one (i.e., only the primary need acknowledge a write for it to be deemed successful). To override this default, your application should issue the new write concern alongside every write command. 
  3. Durability (2): Your application writes should also include a suitable timeout (the wtimeout parameter) to avoid getting blocked when writing is unavailable (and see the first bullet point above about retry logic). However, it is worth remembering that a timeout does not mean a failure to write, only that the write did not happen within a specified time limit. So a write command that timed out may still eventually succeed. Your application will need to understand that.
  4. Reading data: Your application readPreference should be set to primaryPreferred. This avoids read connections being blocked when there is a failover event and a different node becomes the primary.
  5. Scaling (1): Use auto-scaling for storage (disk), as “disk full” errors will generate a service crash. Disk auto-scaling has no impact to the running Database deployment.
  6. Scaling (2): For RAM and/or CPU, we still recommend capacity planning for critical services with unpredictable demand and growth profiles. This requires more forethought but will better help you be prepared for growth in your application that may require more IOPs, RAM or vCPU. 

Backup and restore

The IBM deployment of MongoDB (Standard and Enterprise Offerings) take automatic backups for you as part of a being a database-as-a-service. This backup is done to IBM Cloud Object Storage (COS) and happens every 24 hours. Therefore, the current Recovery Point Objective (RPO) is 24hrs. Customers are able to use the CLI and API to automate on-demand backups to drive potentially lower RPOs. 

The restore process for Standard and Enterprise Offerings can be triggered via the user console or via API and restores into a new MongoDB Instance. The secondaries are then built from this new instance and a new cluster is created. 

The length of this process (and therefore the Recovery Time Objective – RTO) depends on the amount of data being restored.

We highly recommend testing your business continuity and disaster recovery processes before going to production

Further information

Refer to these documents for information on major upgrades and versioning policy:

Refer to this document for guidance on getting to production and other best practices:

More from Cloud

Hybrid cloud examples, applications and use cases

7 min read - To keep pace with the dynamic environment of digitally-driven business, organizations continue to embrace hybrid cloud, which combines and unifies public cloud, private cloud and on-premises infrastructure, while providing orchestration, management and application portability across all three. According to the IBM Transformation Index: State of Cloud, a 2022 survey commissioned by IBM and conducted by an independent research firm, more than 77% of business and IT professionals say they have adopted a hybrid cloud approach. By creating an agile, flexible and…

Tokens and login sessions in IBM Cloud

9 min read - IBM Cloud authentication and authorization relies on the industry-standard protocol OAuth 2.0. You can read more about OAuth 2.0 in RFC 6749—The OAuth 2.0 Authorization Framework. Like most adopters of OAuth 2.0, IBM has also extended some of OAuth 2.0 functionality to meet the requirements of IBM Cloud and its customers. Access and refresh tokens As specified in RFC 6749, applications are getting an access token to represent the identity that has been authenticated and its permissions. Additionally, in IBM…

How to move from IBM Cloud Functions to IBM Code Engine

5 min read - When migrating off IBM Cloud Functions, IBM Cloud Code Engine is one of the possible deployment targets. Code Engine offers apps, jobs and (recently function) that you can (or need) to pick from. In this post, we provide some discussion points and share tips and tricks on how to work with Code Engine functions. IBM Cloud Code Engine is a fully managed, serverless platform to (not only) run your containerized workloads. It has evolved a lot since March 2021, when…

Sensors, signals and synergy: Enhancing Downer’s data exploration with IBM

3 min read - In the realm of urban transportation, precision is pivotal. Downer, a leading provider of integrated services in Australia and New Zealand, considers itself a guardian of the elaborate transportation matrix, and it continually seeks to enhance its operational efficiency. With over 200 trains and a multitude of sensors, Downer has accumulated a vast amount of data. While Downer regularly uncovers actionable insights from their data, their partnership with IBM® Client Engineering aimed to explore the additional potential of this vast dataset,…

IBM Newsletters

Get our newsletters and topic updates that deliver the latest thought leadership and insights on emerging trends.
Subscribe now More newsletters