Contents


MongoDB backup and restore methodology using IBM Spectrum Protect for Linux on Power

Comments

The focus of this article is to describe how to back up fairly large instances of MongoDB. We can accomplish this by using IBM® Spectrum Protect™, formerly known as IBM Tivoli® Storage Manager (TSM), running as backup and management agents on all MongoDB servers. This article shows how MongoDB can be integrated with traditional or existing backup tools such as IBM Spectrum Protect, even in production-style scale.

This is all possible using a MongoDB version built for IBM Power Systems™ (for example, MongoDB Enterprise v3.4 for ppc64le). There are many tools and features embedded directly into MongoDB that assist in the process, and compliment the backup abilities of MongoDB, which are covered in the following sections.

IBM Spectrum Protect background

IBM Spectrum Protect provides storage management solutions for multivendor computer environments. It provides automated, centrally scheduled, policy-managed backup, archive, and space-management capabilities for file servers, workstations, virtual machines, and applications. Furthermore, IBM Spectrum Protect supports systems of all sizes [including virtual machines, file servers, email, databases, Enterprise Resource Planning (ERP) systems, mainframes and desktops]. IBM Spectrum Protect does all this from a single environment that expands as data grows.

This article covers the backup and recovery functions of IBM Spectrum Protect, and it describes how it can be used to protect MongoDB data. In addition, there are a few specific reasons to choose IBM Spectrum Protect. First, it has a trusted track record for being a reliable backup manager for large enterprise systems. Many large customers have trusted it for years and may already have it integrated in their enterprise systems. Even for customers who do not yet use IBM Spectrum Protect, there is no need to have separate backup programs that require separate backup software specific to each program.

There are other important reasons to use IBM Spectrum Protect for a backup solution. IBM Spectrum Protect has a robust platform for managing backed up data. Some applications will just pile data up on a hard drive on some other server and will require it to be manually restored. Or, you would have to manually manage where data is stored and what to do with it over time. IBM Spectrum Protect can be configured with automatic procedures, such as writing data to disk and tape, and has numerous archiving features. And when it comes to restoring the data, IBM Spectrum Protect agents make it simple to get to the most recent backup for that particular system and handles the process of putting the data back where it belongs.

MongoDB background

MongoDB is an open source database considered to be the most popular and fastest growing NoSQL database mostly because of how well it works in areas where traditional SQL databases have trouble. It is very good for dealing with large sets of unstructured data and has exceptionally good read times on the data that is stored. That, combined with powerful queries written in JavaScript, makes MongoDB a powerful tool for modern applications such as mobile and analytics that require frequent read operations and consumption of data. While it is not a replacement for all SQL applications that store structured data, it does give a modern solution for the massive amounts of unstructured data and mobile traffic.

In addition, MongoDB is designed to be highly scalable and available. These features are built into the design and structure of a MongoDB environment. MongoDB in a production environment is a cluster of processes running different tasks, usually running on different systems. It consists of three different types of servers:

  1. Config servers - These servers store metadata about the locations of data. For a production environment, there needs to be exactly three config servers. These are the metadata servers that hold all the important information for the clustered DB. They hold all the information about where and how much data is stored.
  2. Query routers - These special instances of MongoDB are the interface or gateway between outside applications and the data stored in MongoDB. Requests come into these servers, and the data requested gets returned to the application through the query router. There is no data stored permanently in these servers, which gives an added layer of security when connecting to the outside world. These MongoDB instances work by querying the config servers to find where the data should be stored. Then they intelligently fetch the data and return it to the application. They also act as the management interfaces for doing cluster-level management. While all the configurations are stored on the config servers, query routers are the administrative consoles through which you can access all settings and preferences. There can be any number of MongoDB servers, but you need at least one for the database to be functional.
  3. Shards - This is where the data is being stored in the system. The purpose of sharding is to horizontally scale the NoSQL database. The data is broken into pieces (AKA shards) among a set of servers for the purpose of keeping the data consistent and available, while avoiding bottlenecks. As additional shards increase complexity and costs, shards should be used as needed and not in excess. A strength of the IBM Power® servers is the ability to reduce the requirement for shards by being able to scale compute, memory, and I/O to allow each shard to service more workload than is possible with x86 servers. The servers are also automatically load balanced in order to prevent one server becoming disproportionally full or large. The idea is to break up the data set as logically as possible and spread it across different systems. Additional shards can be added, and the database will redistribute the data equally, allowing MongoDB to handle even more traffic. Of course, each shard is not an individual server. Each shard is a series of duplicated servers called a replica set.

    Replica sets - This is the solution to data redundancy and availability issues that can be faced with large databases, especially those of the NoSQL variety. The goal of this system is to have everything backed up, in a synchronized fashion, to the point where the complete failure of an entire server could be handled automatically with no downtime. Replica sets have an election among themselves to pick a single server to be the primary. This primary is the one responsible for communicating with write operations for a replica set. All write operations are handled first by the primary server, which writes them to an operation log that it distributes to the secondary members of the replica set. The secondary members then play back that log and apply all operations to their own data. One very interesting feature of replica sets is that while the primary handles all the write operations, data can be read from all of the replica servers at the same time. This means that a read operation can occur concurrently on the same piece of data across a replica set. This leads to great availability. As far as failover goes, the replica sets are designed to automatically detect and failover for the loss of any server in the set, including the primary. With the loss of a primary, the servers will automatically elect a new primary and continue operation.

Figure 1. MongoDB structural diagram

As shown in Figure 1, in a production environment, each shard of a database is also a replica set. So, there is always built-in redundancy of data. There is no place in the MongoDB cluster where the data exists only once.

The fact is, MongoDB is built with the purpose of persistent and reliable data, and this is why it has native sharding and replica sets. Sharding is primarily focused on the scalability of the data by dividing a database among multiple servers once the capabilities of a single shard is exceeded. As noted above Power servers allow for larger shards as compared to x86 servers. The replica sets are sets of identical shards that sit on separate servers for redundancy and speed.

There is also a feature called journaling, in which the database stores the pending write operations in a journal file before they are written to disk. These files can grow as large as 1 GB in size and are deleted only when all the operations in the journal have been completed or a clean shutdown of the server has occurred. In the case of a dirty shutdown, as soon as the server is turned back on, the journal file is read and played back to make sure that all the operations are committed, and commit the ones that are not. This, again, can be done automatically once the MongoDB instance on the server is brought back online. If a write error occurs, MongoDB will restart itself in order to fix the issue. It will restart, read from the journal, delete the old journal file, and create a new one.

Within journaling, sharding, and replica sets, there is a fair amount of automated failover for minor hiccups (such as a primary shard losing connectivity and going offline). The system could continue to run without flaws and the downed shard could be brought back up automatically after a restart. However, on bigger files systems that are storing important data with retention requirements, backup is important. In this article, we use IBM Spectrum Protect as a solution to enable fully scheduled backup operations for not just MongoDB, but an entire enterprise ecosystem.

In an enterprise system, you need to have full backup of all your data for record maintenance, audits, and disaster recovery among other things. MongoDB provides backup through its Ops Manager process which is normally managed by database administrators. The use of Ops Manager is outside the scope of this article. By using IBM Spectrum Protect, we can merge MongoDB with existing enterprise infrastructure that already has backup policies in place.

In our backup procedure, IBM Spectrum Protect uses a built-in Red Hat (also available with most other major Linux® distributions) operating system feature called Logical Volume Manager (LVM) snapshot to do a file system copy of our MongoDB files on a replicated server. This happens in a series of steps that is extensively described in the following sections. What should be pointed out at this point is that all the journals and database files need to be in the same logical volume to avoid storing snapshots of unnecessary data. In the end, the snapshot is compressed and stored by IBM Spectrum Protect according to its own storage procedures.

The MongoDB backup process relies on a dedicated replica set member for each shard that is used strictly for backup purposes. For availability purposes in a production environment, there would be multiple replicas of a single server. For our backup purposes, we want a replica that is predefined to be the backup instance. That means it cannot be a primary server, and MongoDB gives configuration options to make this happen. It will continue to operate as a functional member of the replica set while it is being backed up. However, we know that the backup procedure will take some resources on the backup server, and we don't want to slow down the cluster. The production environment and backup schedule should be designed with that in mind.

This backup procedure displays a few minimum requirements for the MongoDB cluster. This coincides with the minimum requirements for a production environment described in the MongoDB documentation.

Benefits

IBM Spectrum Protect offers the following benefits as a backup and restore tool to enhance MongoDB's capabilities:

  • Single backup manager for all applications
  • Robust and configurable storage mediums and procedures
  • Ability to use IBM System Storage® SAN Volume Controller (SVC) to perform a quick logical backup and manage data movement
  • Backup cataloging
  • Backup changed data only
  • Built-in scheduling
  • Single control point
  • Database for logging, configuration, statistics, metadata

The following sections show you how some of these features are utilized to make MongoDB a more robust database.

MongoDB environment setup

Set up your replicated MongoDB servers, according to best practices, with replicas and shards.

Install MongoDB on an IBM POWER8® server following the instructions in the paper "Get started with MongoDB on IBM Power Systems running Linux".

Ensure that the backup environment meets the following minimum requirements:

  • There are three configuration servers.
  • There are at least three servers running as a replica set for each shard, that is, one primary server and two secondary servers per shard.
  • All servers in the cluster are networked by their domain name, and not directly by their IP addresses, for failover purposes.
  • At least one MongoDB router instance (mongos) must be able to handle all requests. There should be enough routers to sufficiently handle incoming and outgoing requests. More routers should be added as needed.

The procedure for backing up MongoDB is the main topic of this article. This article does not cover the entire procedure for creating or maintaining a production MongoDB instance. For more information on that and other MongoDB configurations, refer to the official documentation at: http://docs.mongodb.org/manual/.

IBM Spectrum Protect resources and setup

You need to perform the following steps to set up the IBM Spectrum Protect server and client.

  1. Use the IBM Spectrum Protect server

    To achieve the consistency and flexibility of the MongoDB Linux on Power backup/restore methodology, the following IBM Spectrum Protect server resource is recommended.

    IBM Spectrum Protect Server version 8.1.1.00 for Linux on Power LE (RHEL 7.3)

  2. Use the IBM Spectrum Protect backup-archive client

    To achieve consistency of the MongoDB Linux on Power backup/restore methodology, the following IBM Spectrum Protect backup-archive client resource is recommended:

    IBM Spectrum Protect backup-archive client version 8.1 for Linux on Power LE (RHEL 7.1, Ubuntu 14.04 or 16.04, or SUSE LE 12 or newer versions)

  3. Install and configure the IBM Spectrum Protect backup-archive client.
  4. The IBM Spectrum Protect backup-archive client should be installed on each Linux on Power MongoDB server. Self-register the IBM Spectrum Protect backup-archive client with the IBM Spectrum Protect server, TSMLINUX. Install and configure the IBM Spectrum Protect client scheduler.

The focus of this article is about the use of IBM Spectrum Protect. For an example of how to perform the installation and setup, refer to MongoDB Backup/Restore Methodology using IBM Spectrum Protects for Linux on z.

Implementation

The goal of our IBM Spectrum Protect agent backup is to provide a robust and reliable backup solution for MongoDB environments that can be done entirely on premises. For customers already using the robust IBM Spectrum Protect software, or for customers who may have more to backup than just MongoDB, this provides a unified solution. This example shows multiple shards, but could be implemented with a single shard as well.

The best way to capture a point-in-time backup would be to initiate LVM snapshots simultaneously across all the servers being backed up. The servers continue to operate in the cluster and the snapshot freezes the data being backed up. After the backup has been completed, the server continues to operate as normal. So, in this backup method, there is no downtime for any of the MongoDB instances. An overview of the entire backup procedure is described in this section, with the specific commands given in the testing section.

IBM Spectrum Protect backs up the data as follows:

  • A script will be run to shut off the MongoDB balancer across the entire cluster to ensure a quick and accurate backup.
  • One of the three configuration servers will be shut down for the backup procedure (that way, no data can be changed).
  • IBM Spectrum Protect backup procedure will be called on the designated backup replica members and the one configuration server simultaneously.
  • Once the backup is complete, the configuration server will be turned back on, and it will automatically begin to catch-up with the other two configuration servers
  • The balancer will be resumed, and the MongoDB cluster will resume all normal operations.

Perform the following steps to restore the backup:

  1. Use the IBM Spectrum Protect commands to restore a backup of the entire logical volume to the system.
    • There must be a volume group of the same name on the system being restored to, with enough space to hold the backup volume.
    • You should make sure that all of the DNS and host name settings of the new system matches with that of the backed-up system.
    • Make sure that the configuration file for this new system is the same as the one it is being restored from.
  2. After running the command, start the MongoDB instance and notice that it works just like the original server that was backed up.

Keeping track of the host names and configuration files for an environment can become an increasingly complex task. We handled this by using dynamic configuration files and host names that were generated using Ansible. Regardless of the approach used, the setup and configuration metadata is worth backing up because that can cause a delay in the recovery of a system.

Test scenarios

AcmeAir is an example application that can be used to validate how Spectrum Protect works with MongoDB backup procedures. You can find the source code and documentation for this driver at https://github.com/acmeair/acmeair-nodejs.

AcmeAir is an open source application that is meant to resemble an airline company website that includes flight lookup and a booking system. There are customers who have data attached to their login accounts, as well as flights to book, books with layovers, bookings for flights, and current customer sessions online. This uses a Node.js environment to run a web program that has traffic driven against it by a JMeter automation driver. Traffic is sent to the web page through API, and the application uses a Node.js driver to run traffic against our MongoDB system. The question, how on earth do you shard the data? still remains. This is an area where MongoDB could mostly use improvement with dynamic shard values. Right now, we are going to walk through the index sharing of the AcmeAir database with you.

The AcmeAir database contains five collections, each of which will be sharded at some point (for example, booking, flight, flightSegment). Sharding is enabled for the database, and then each one of the five collections should get its own shard index and each collection will be independently sharded.

First, we need to load the database with some initial data and perform a few runs of the driver to get some test data. The database will be ready to shard but will not yet be actively sharded. We do this because we don't know what the values in those collections will be, or the distribution of the values. If we knew the expected distribution beforehand, we could shard before adding data to the collection. Then, we can access the MongoDB shell of the primary server using the following command:
mongo ipaddress:port

where port is the port that the primary replica of the database is running.

Then, you will be in the MongoDB shell. Next, type in the following command:
use acmeair

This switches the focus to the correct database we are working with.

Next, we need to get all the collections in the database using the following command:
db.getCollectionInfos()

This gives a list of the collections that exist within the database. In our AcmeAir example, the output from this command is as follows:

rs1:PRIMARY> db.getCollectionInfos()

[
        {
                "name" : "airportCodeMapping",
                "options" : {
                }
        },
        {
                "name" : "booking",
                "options" : {
                }
        },
        {
                "name" : "customer",
                "options" : {
                }
        },
        {
                "name" : "customerSession",
                "options" : {
                }
        },
        {
                "name" : "flight",
                "options" : {
                }
        },
        {
                "name" : "flightSegment",
                "options" : {
                }
        }
]

Alternately, you can use the following command for a more compressed list of collection names:
db.getCollectionNames()

This resulted in the following output:

[
        "airportCodeMapping",
        "booking",
        "customer",
        "customerSession",
        "flight",
        "flightSegment"
]

After you have the names of the collections, you can find out the fields they are made up of using the following command:
db.collectionName.find()

where collectionName is the actual name of the collection, such as airportCodeMapping.

Here is the example output of the command:

rs1:PRIMARY> db.airportCodeMapping.find()
{ "_id" : "BOM", "airportName" : "Mumbai" }
{ "_id" : "DEL", "airportName" : "Delhi" }
{ "_id" : "FRA", "airportName" : "Frankfurt" }
{ "_id" : "HKG", "airportName" : "Hong Kong" }
{ "_id" : "LHR", "airportName" : "London" }
{ "_id" : "YUL", "airportName" : "Montreal" }
{ "_id" : "SVO", "airportName" : "Moscow" }
{ "_id" : "JFK", "airportName" : "New York" }
{ "_id" : "CDG", "airportName" : "Paris" }
{ "_id" : "FCO", "airportName" : "Rome" }
{ "_id" : "SIN", "airportName" : "Singapore" }
{ "_id" : "SYD", "airportName" : "Sydney" }
{ "_id" : "IKA", "airportName" : "Tehran" }
{ "_id" : "NRT", "airportName" : "Tokyo" }

Mongo requires that you manually choose the shard field, and in this case, we will choose the airportName field.

After you have decided the fields that you would like to shard, it is time to make it happen. The first step in the process is to get into the mongos admin shell, which you should already be in from the last step.

Then you need to issue the use dataBaseName to switch focus to the database of your choice.

These next steps will be repeated for each collection in your database. Remember, it is the collections (more or less the equivalent of tables in a traditional relational SQL DB) that make up the database getting sharded. And, the sharding process is essentially indexing a collection based on one field of the collection and then splitting that collection at the halfway point of that index.

Here are the steps.

  1. Before you index, you should check if a good index already exits, with the command db.collectionName.getIndex().
  2. Run the db.collectionName.createIndex( { fieldName : 1} ) commandto index the collectionName collection on the fieldname field. The 1 after the field name stands for the ordering, in this case, in descending order. -1 would signify ascending order.
    • Note: The indexing of a field is necessary only if the data is already stored in the index. If you know the format of the data beforehand, you can skip this manual indexing. MongoDB automatically makes the shard key as the index. After that, you can set the shard key using the following command:
      sh.shardCollection("db.collection", shard-key)
    • For example:
      sh.shardCollection("acmeair.airportCodeMapping",{"airportName": 1})

Backup procedure and scripting

Now that we have a fully setup MongoDB database, we can start a backup. One of the main reasons we settled on a LVM snapshot to get a point in time backup of our system is that we cannot otherwise guarantee a perfect moment-in-time backup without completely shutting down each MongoDB backup instance. There is a MongoDB method, fsynclock(), which is supposed to stop write operations to a database, but it turns out that the fsynclock()method in MongoDB does not guarantee that its storage engine, WiredTiger, actually stops writing. Because that creates a data integrity issue, we use the LVM snapshot, instead. That actually makes the process simpler, and our only remaining issues are to stop the balancer and try to get the IBM Spectrum Protect agent to back up with some synchronization.

Part of what needs to be scripted is the stopping of the balancer. This balancer is the locking mechanism, which manages the redistribution of data among the shards in a MongoDB cluster. For example, if you sharded a collection on the Last Name field, the split between the two shards might be the letter 'L'. If MongoDB notices there are significantly more entries in the 'A-L' shard than the 'M-Z' shard, it may change the split point from 'L' to 'K'. In this case, all the data with the last name starting with an 'L' would move from shard 1 to shard 2. It makes sense to turn the balancer off because you don't want the balancer to move things during the backup process. New data will still be placed on the correct shard based on its shard key, but the shard point will not change when the balancer is off. That means that MongoDB can still accept new write operations, as well as read while the balancer is turned off.

It is possible for the data to become unbalanced during this time if there is a large influx of data from one section of the shard key. This can be reduced by using more complex shard keys, like a hash key, or sharding on a different value altogether. Either way, MongoDB will re-balance itself after the balancer is re-enabled. You must be careful that the balancer is stopped as well as disabled and not just one or the other. It is possible to run into the problem of a stuck balancer. We found that the best way to deal with that is to find the mongos instance that is causing the trouble and bring it down softly. This is accomplished through the following procedure:

  1. Connect to one of the mongos instances through the MongoDB shell.
    mongo ipaddr:27017
  2. Now that you are in the MongoDB shell, you want to use the config database. Then, see the state of the database
    use config sh.status()
  3. Near the top of that data printout, there will be a field called balancer. The data looks as shown below:
    balancer:
        Currently enabled:  yes
               Currently running:  yes
    Balancer lock taken at Sat Jul 08 2017 16:18:45 GMT-0400 (EDT) by ltmngo10:27017:1444334119:1804289383:Balancer:846930886
            Failed balancer rounds in last 5 attempts:  0
            Migration Results for the last 24 hours:
                    No recent migrations
  4. This tells you if the balancer is enabled, if it is running, and when it started. The balancer can be disabled, and still be running, because disabling the balancer doesn't stop the running processes but simply waits for them to stop. To do this, run the following command:
    sh.setBalancerState(false)
  5. You can check again and see when the balancer stops running to begin the backup process. If the balancer does not stop running, and the lock start time from above is more than a couple of hours old, you may need to manually fix it.
  6. First, try to get MongoDB to take care of it using the following command:
    sh.stopBalancer()
    • This will give you live prompts of the active process to shut down the balancer.
  7. In our experience, if that doesn't work, you will need to shut down the server that holds the lock. In the status above, it lists which server holds the lock. It should be a mongos server that can be brought down safely and then brought back up without much consequence. From the Bash shell, run:
    ps aux | grep mongos
    • This will give you the PID of the mongos instance you need to kill.
      kill xxxxx
    • This will kill the process ID of the number xxxxx that you specify.
    • Bring it back online and give it some time to time out the lock before you try again (about 15 to 30 mins, at most). The default timeout should be 900,000 milliseconds, or about 5 minutes.
  8. Once the balancer is stopped, the mongos data servers are ready to be backed up, but the config server is not.
    • To get a clean config server backup, completely shut down one of the config servers. We do this because bringing the service down totally guarantees that there will be no reads or writes to that configuration, and we have a true moment-in-time backup. There is also the benefit that the config servers won't try and do any chunk migrations or allow any sort of config changes when one is down, making sure the backup will work.
    • This is done with by stopping the running mongod service on the desired backup device with the command service mongos stop.
    • Now we are ready to start the IBM Spectrum Protect backup process.

About the backup process

IBM Spectrum Protect has a built-in method called backup image dev/vg/logical_volume. On Linux systems, this method runs a backup procedure that involves backing up the specified logical volume with Linux's built-in LVM. Specifically, the LVM creates a snapshot of the logical volume and IBM Spectrum Protect creates a backup of that snapshot and uploads it to the IBM Spectrum Protect server.

The way an LVM snapshot works is to create a copy-on-write clone of the logical volume in the free space of the volume group. What that really means is that the LVM makes a virtual copy of the actual logical volume by creating a virtual drive in the free space of the volume group. It stores hard links in the virtual snapshot drive that point to the data on the real logical volume. That means that a snapshot on a static data set would take almost no (or zero) storage space (storage space of hard links is negligible). When something is about to get written to the original copy, the original data will be copied to the snapshot before the new data is written on the actual disk. That is, you can keep the instance up and running with new write operations happening to the database, all while you take a moment-in-time backup that starts the moment you run he command. The snapshot always contains the data exactly as it was on the logical volume when you started the snapshot.

It is important to note that the journal files automatically created by MongoDB are also part of this snapshot. The journal files are MongoDB's method of tracking recent changes that are being made before they are written to disk. This is especially important in a backup process, because MongoDB holds some data in memory before it flushes it to a disk in a batch. By also backing up the journal, we have a record of all the transactions that have occurred on the database, but have not been completely written to disk. The snapshot function can capture the state of both the database and the journal file at the exact same time to ensure that no data is lost in the process. MongoDB will read from the journal file when it is restored and apply all pending or incomplete updates.

There is, of course, some storage considerations to keep in mind, which might vary from instance to instance. Logically, any data that changes during the live backup gets written to disk. This space for the snapshot is taken from the free space of the volume group that the logical volume belongs to. You need to have enough free space in the volume group to cover all the changes that might occur during backup time. In a worst-case scenario, you would need an equal amount of free space in the volume group as the logical volume occupies. This, however, is most often not going to be the case, depending on what percentage of your database gets overwritten during the time of backup. In addition, that space can be added to the volume group for the backup and then removed afterward.

Despite the extra disk space needed, it is made up with the ability to perform live backups. We don't have to stop any of our database servers for a backup. So, there should be no significant performance impact while the backup is in progress. The primary servers in the shard will still operate at full speed, with only a reduction in read time for any queries being processed by the backup server. This can be eliminated as well with some special replica server modes, such as Hidden that prevent read operations from happening on the backup server.

It is also worth noting that this is the backup method that MongoDB currently recommends, and many of the cloud backup companies currently use. There are a few backup and dump tools provided by MongoDB, but, IBM Spectrum Protect provides a much cleaner way of implementing the LVM snapshot backup, and it handles the storage and safekeeping of the files on the backend server.

As for the actual IBM Spectrum Protect commands, it is very simple. On all the backup servers, at the same time, issue the command:
dsmc backup image /dev/volumeGroup/data

This is the basic command that initiates the backup from the command line on the client servers, but there are a few parts and options you should know.

First, dsmc is the IBM Spectrum Protect command on the Linux command line that the program responds to. All of our IBM Spectrum Protect commands that ran from the shell begin with dsmc.

The backup image command tells IBM Spectrum Protect what to do (backup), and what technique to use (image).

The last part of this command is the logical volume name that you need to back up. It is important to note that you must use the device path (/dev/volume_group/volume) and not just use the name of the mount point (such as /data). Using the file system mount point (for example, /data) will be interpreted differently by IBM Spectrum Protect, and it will try to perform a file system backup instead of an LVM logical volume snapshot.

There are also a couple of other commands that are useful and worth knowing.

The first one is snapshotcahesize. It is an option that allows you to set the percentage of the total size of the logical volume that IBM Spectrum Protect will reserve from the volume group. So, if your /data directory is 100 GB, passing 10 to this option will cause LVM to reserve 10 GB from the volume group for the snapshot. If more than 10 GB of the original 100 GB gets changed during the backup, the IBM Spectrum Protect agent will return an error and the backup will fail. In the event of a mission-critical 100% time-sensitive backup, you will want to have 100% of the logical volume free and unallocated in the volume group (in this example, 100 GB).
dsmc backup image -snapshotcachesize=10 /dev/volumeGroup/data

Remember that the default value is 100%. So, if you do not use this flag, the volume group free space will have to be greater than or equal to the actual size of the logical volume that you are using. In our testing, we use a value of about 25, and never even came close to running out of space. This should be monitored and set on a case-by-case basis. You will also notice that the flag is joined to the value by an equal sign, which is different compared to the Linux norm.

This is what the command would look like when designating 55% cache size:
dsmc backup image-snapshotcachesize=55 /dev/volumeGroup/data

One last important command line option is compression. This option allows you to enable the compressing of this backup before it is uploaded to the IBM Spectrum Protect server. We will get into the details of compression shortly. The command is:
dsmc backup image -compressi=yes /dev/volumeGroup/data

That is all there is to backing up our MongoDB cluster. With a properly designed LVM layout, and well-planned configuration, IBM Spectrum Protect can take care of the backup in one command.

You should be aware that this is a point-in-time backup. This is a backup that is meant to capture a perfect copy of the database the instant the backup process is started. Despite the MongoDB data servers being live, and performing read and write operations, the data that is backed up is the data that is on the disk the moment the backup is started. All data that comes in or gets removed between the start and finish of the backup process will be reflected in the current state of the server but not in the backup data.

Running the commands

There are a few different ways that you can go about running the commands to perform the backup process. There are a few different commands you need to issue to MongoDB as well as an IBM Spectrum Protect command. The exact method for scripting this will come down to the scripting preferences and skill set of the database administrator. One very important factor in this scripting is the need for all of the IBM Spectrum Protect backup processes to start nearly at the same time. In order for the point-in-time snapshot to work, you don't want any data to come into any of the data servers that is not going to be recognized in the config server. This data will be orphaned in the database with no reference to it in the config server and it is pointless! To avoid that as much as possible, there needs to be a reliable synchronization. There are a few methods that we believe will meet this requirement:

  • Cron and shell scripts - Scripting this out in a shell script and then running a synced cron on all the backup systems is the classic way to handle Linux administration. You may need to observe the status of the IBM Spectrum Protect service to make sure that the backup happens at the same time on all the systems, and that it is relatively error free.
  • MongoDB commands - Issuing the shutdown and startup commands to the running database can be a little tricky. But there are quite a few ways to do it. You can actually write the shutdown commands in JavaScript (mongos native language) and pass that JavaScript file to the MongoDB shell through bash. There are also Python drivers (such as PyMongo), if that is your preferred scripting method. You can do just about anything from the PyMongo driver that you need to manage MongoDB. There are also drivers in just about every language that you can use.
  • Automation software - This includes software such as Chef, Puppet, Ansible and SaltStack. Many companies already use these services to manage their infrastructure, and there are many great benefits in using these for MongoDB. MongoDB scales by adding new systems, shards, and replicas. The cluster can get complex quickly, and you don't want to manage that by hand. But specifically, in terms of backup, these services give you the ability to manage backup from a single point. This allows you to synchronize all the servers through the automation software. We chose to use ansible in this article for building and backing up our MongoDB clusters. Not only was it flexible, but also easy to use.

Restoring the database

Once again, when restoring the database, using the IBM Spectrum Protect command is very simple, but you also need to take care of the following considerations:

  1. Assumptions
    You are trying to recover from a point-in-time snapshot, either on fresh servers, or on servers that previously ran MongoDB where the data is corrupted or lost. Either way, the process is nearly identical.
  2. Important considerations

    There are a few design features of MongoDB that affect how we restore the backup data. To review what was already covered:

    • There are always three configuration servers in a production cluster that are essentially identical.
    • There are multiple shards and each shard holds a small slice of the whole MongoDB database.
    • Each shard is replicated, which means there are multiple MongoDB instances that hold identical data.
    • MongoDB server instances don't persist any data, and only need a configuration file pointing to the MongoDB configuration servers to run. The configuration servers hold the sharding rules and the configuration for most of the cluster. The configuration servers also hold all the metadata that knows where the actual data is stored.
    • The MongoDB instances actually store the information about their own primary and secondary servers.
    • Mongo servers are identified by a host name. So, you need three things to fully restore a system: the configuration file, the data, and the host name. These facts determine how to backup and restore. There are a few important conclusions we draw based on these observations.

    First, if a server is identical to another server, you don't need to backup each server and restore them individually. Instead, take a backup of one of the identical servers. Now when you restore it, you propagate it to all the identical servers in the cluster.

    Secondly, because all the configuration files are part of the backup, a well-planned and well-executed restore operation would restore the old configurations and require no extra configuration by the user. In the end, you just power the restored systems on, and notice that it is just as it was when it was backed up. This makes restorations much quicker and much simpler, especially at scale. This can be done by keeping the configuration file in the logical volume with the backup data, or by dynamically creating the configuration files with one of the automation platforms mentioned above.

    Finally, the servers being recognized by a host name make it easy to replace hardware and restore to totally different systems. If you need to change some or all of the hardware out when doing a restore, it is as simple as changing the host names on the systems to match with what was in the original configuration. This way your entire configuration is intact and MongoDB can continue to communicate across the cluster over the network.

  3. Server configuration

    You need to get your server in the exact or similar state as it was before MongoDB crashed. Therefore, it is highly suggested to use some automation software to create your infrastructure as code. This makes it really easy to deploy a structurally identical server. In any case, you need to make sure that certain steps are performed before running any IBM Spectrum Protect restore features. These are very nearly identical to the steps taken to originally set up the server. These steps must be performed on all servers that will be a part of the restored cluster.

    1. Make sure that MongoDB is installed and the identical configuration file is in the right place. This includes making sure that all the directories needed for logging and process ID (PID) files are there. These files should be owned by the user who will be running the mongos instance. If not, use chown to make sure that the mongo process can access any of those files.
    2. Make sure you create a logical volume with at least as much space as the original logical volume that was backed up (which can be determined by the size of the backup file).
      lvcreate -L 10G -n data rhel7_system
    3. In the prompt asking if you want to overwrite a file system block, respond with yes. The underlying file system that we need is actually on the image we will restore.
    4. IBM Spectrum Protect requires that there is an allocated logical volume to restore the backup. It will not restore to free volume group space. It needs the shell of the logical volume.
    5. You cannot restore a logical volume smaller than the backup image size. If you try that, it will fail. If it is larger, it will still work, but only if you refrain from building a file system on the newly created LVM.

      If you create a file system, specifically an XFS file system on the newly created shell logical volume, and try to restore a smaller backup image, the IBM Spectrum Protect agent will try to shrink the file system to automatically match them together. The problem is XFS does not have a file system shrink command, and therefore, it fails. For this type of restoration from a full LVM snapshot, you do not need to build a file system on the logical volume being restored to. It should be avoided because of potential errors that have been discovered.

    6. Make sure that the mount point exists for attaching the LVM once it is restored. If not, create it with the mkdir /FileName (i.e. mkdir /data) command.
    7. Make sure that the IBM Spectrum Protect agent is installed on the system, in a directory different from the one you will be restoring MongoDB onto (for example, if IBM Spectrum Protect is stored in /opt, you should not be trying to restore the /opt logical volume).
    8. If you are using different hardware, make sure you change the host name to match the host name in the configuration file, and that hosts reside in the replica set of the image you are about to restore.
  4. Restoration command
    • If you are restoring to the same system that the backup was originally taken from, use the following command:
      dsmc restore image /dev/rhel7_system/data
    • If you wish to restore the IBM Spectrum Protect image to a different system, you need to authorize this new system to access the backup images of the original. To do that, you do the following:
    • Access the server where the backup originated. Enter the IBM Spectrum Protect shell by entering the dsmc terminal command.
    • Notice that your prompt is tsm>
    • Enter the q access command to identify the nodes that have access, or enter the q files command to view all the backed up files. The set access backup "*" nodeName. set access backup "*" nodeName command lists all nodes with nodeName given access to the backup files of the current node.
    • Now that the target restoration server can access the backup, you restore using the following command:
      dsmc restore image -fromnode=LTMNGO09 /dev/rhel7_system/data /dev/rhel7_system/data
    • Now that the LVM is restored, we need to remount it to the file system. Run the following command:
      mount /dev/mapper/<volumeGroup> - <name> <mountPoint>

      (For example, mount dev/mapper/rhel7_system-data /data)

    • Start the MongoDB instance on that server and try to connect to the local MongoDB shell with mongo —port <port>. After verifying that the system is up and running in a proper configuration, stop the MongoDB instance.

      This is where the restoration process can differ slightly. Because we have identical servers, we only took a backup image of one of the replicas. Now that we are restoring, we need to restore that one image back to all of the replicas. The two easiest and simple ways of doing this are to either:

      • Restore to one server using IBM Spectrum Protect and use a copy command to move the files from the restored server to all its replicas

      Or,

      • Install the IBM Spectrum Protect agents on all the systems and simultaneously restore the image from the IBM Spectrum Protect server to watch the replicated servers. This requires the aforementioned step to ensure that every other node in the replica set has access to the backup node.

      The advantage of copying files after the restore is to reduce the number of IBM Spectrum Protect agents. Because we need backup data from only one system, most of the IBM Spectrum Protect agents would be installed strictly to perform restore operations and never any backup tasks. When copying files from the restored backup server to the replica sets, the IBM Spectrum Protect server is doing less work, and the Spectrum Protect agents need to be installed only on the backup servers. Depending on the pricing package of IBM Spectrum Protect, this can play a factor, especially at scale. Of course, you must write a program to copy and distribute files from the restored server to the other servers and there is a little bit more customization required.

      Using the Spectrum Protect restore function for all your servers can be a little bit simpler. However, there is a requirement to have a Spectrum Protect agent on all the systems in the cluster. There is also an added strain on the Spectrum Protect server, having every single node in the MongoDB cluster to restore the LVM image directly from it. For the sake of the article, we used two methods and found that both were satisfactory.

      Method 1 (Copying the files): Restore to the server with the Spectrum Protect agent as explained above. Go into the data directory of the restored volume and use TCP to transfer the data to the other servers.

      scp -r /data/mongo/….. hostname1:/

      For example:

      scp -r /data/mongo root@ltmgo04:/data

      Then make sure you use chown on the files when they arrive at the other server to make sure that the files are owned by the user who runs MongoDB (by default, mongod). You could also use rsync or some other file copying/synchronization method to pass the files between the servers.

      Method 2 (Using IBM Spectrum Protect on all servers): In this method, you need to install Spectrum Protect on all the servers and run the set access methods that were given above. They need to be run on every backup system for every member of their own replica set. So, they have free access to that Spectrum Protect image. Then you can use the remote restore method to restore the file system, following all the restore steps mentioned earlier. Then you should change the ownership of the data directory to ensure that it is owned by the user who runs MongoDB.

      This restoration process is the same for all MongoDB data servers, and the MongoDB configuration servers. The only restoration that the mongos servers need is for the configuration file to be put on the server they will run from.

      Now that we know how to get all the individual servers backed up, we need to make sure to bring the cluster up in the right sequence.

      At this point in the restore process, the MongoDB instance should have all its data on it and should be up. Then the MongoDB service should have been shut down again, leaving all the MongoDB services currently down.

      Moving forward, we will start the MongoDB services, and keep them on, barring any errors. This is the launch sequence for restoring the MongoDB instances, and it is just about the same as the original startup.

      First, you must start each member of the single replica set. You need to make sure that there is a primary and a secondary node and that they can all communicate with each other and recognize themselves as part of the same replica set. You must do this with all different replica sets, leaving them on and running as you move on to the next.

      Second, once all the replica sets are active, it is time to move on to the configuration servers. These must all be turned on before any of the mongos instances. After all the three servers are up and running, we can start to bring up the mongos instances. Make sure you do not change the host names of the configuration servers (ever). Even if you restored to new config servers, you should change their host names to match with what was in the original mongos configuration.

      You should start by bringing up one mongos instance and checking the log to make sure that you can connect to all the configuration server instances. After you see that, connect to the MongoDB shell through the mongos instance. Then you can run shard tests and look at the data and make sure everything is there. If everything is indeed working, you have successfully restored a MongoDB backup and are running an active cluster. Bring up as many mongos instances as you want and enjoy your freshly restored server.

  5. Where to put the data

    As a result of various factors, having to deal with the backup procedure, we determined that the data portion of MongoDB should be kept in a separate volume group and mounted to the /data directory. Now, it is possible to mount it to an existing volume group, but it must be its own logical volume. We do not want to waste any space when we do our backup, and we want to be able to mount it with ease in a known location.

Performance considerations

Considerations to be aware of during the backup process:

  • Time - Time is not a huge factor because there is no issue in the MongoDB cluster performance during the backup. The throughput for the physical backup is dependent upon the throughput of the infrastructure and how many shards are backed up concurrently.
  • CPU performance - There is an additional CPU utilization that can be observed during the process. Doing an uncompressed backup saw the percentage of idle drop between 10 to 20 points during the time it ran. The most extreme case was when the compression was turned on. We saw some systems go to 0% idle for short periods of time (30 seconds) when doing a compressed backup, which is certainly not ideal. These systems are indeed live when the backup occurs and may be called upon to do lookups, and such a high CPU utilization can cause a bottleneck in MongoDB. However, we noticed no discernable drop in MongoDB cluster performance during a compressed backup. It is certainly a tradeoff between CPU and network traffic and storage. The best method should be decided on a case-by-case basis. It is important to remember that MongoDB is not typically bound by CPU utilization. The stressor is more likely to be the memory space and the disk I/O. Given that in our test cases, the backup server was a secondary replica that handles only occasional read operations and no direct write operations, the CPU cycles used by the backup process should be negligible. By design, the backup process takes place on a server that is under less load than the primary server and can backup quickly, while remaining available to handle requests during a usage spike, and to stay current with the rest of the database without the need to catch up after the backup.
  • MongoDB performance - The backups leave all the servers, including the backup servers, active and able to perform normal tasks. This should leave the performance of MongoDB unchanged except in the most stressful scenarios. The biggest performance tradeoff during the backup process is the disabling of one of the configuration servers. This means that the mongos servers must retrieve all the metadata from two rather than three servers. However, version 3.0 of MongoDB will not scale past three configuration servers, which means there really is no performance gain in having more configuration servers; and it is more of a redundancy measure. After the config server is back online, it will go into a sync mode in order to catch up on any metadata changes that may have occurred while it was offline. This normally completes in seconds and causes no issues at all.
  • Compression versus non-compression backups- For the IBM Power version of the Spectrum Protect Server, you have the option of using compressed and non-compressed backups. The compressed image backups require more time to complete.

Summary

MongoDB continues to grow in popularity and finds widespread use in Power environments. IBM Spectrum Protect, formerly Tivoli Storage Manager, is a mature, proven, and strategic world-class data protection platform that helps organizations of all sizes meet their data protection requirements.

This article documents methods to backup MongoDB data with IBM Spectrum Protect. The Spectrum Protect backup function is a good option for data protection if the customer already has Spectrum Protect established in their data center.

The backup procedure reviewed here recommends using IBM Spectrum Protect to create Linux file system snapshots against a MongoDB replica set member dedicated as a backup server. The minimum requirements to setup the backup environment are defined above. The steps are outlined to do the backup of the data and journal files. The backup can be done currently with application write operations to the data with no application down time.

IBM Spectrum Protect restore of a MongoDB database was also tested and documented. The steps to prepare the target file systems are included along with the IBM Spectrum Protect commands to initiate the restore. The recommended restore option is to recover one replica set member using IBM Spectrum Protect and use Linux commands to copy the data to other members in the replica set. Another option is to have Spectrum Protect agents on all servers and use IBM Spectrum Protect to restore all of them. Hence, this article provides multiple options: We show how flexible this solution can be and we strive to allow the customers to determine what can work best for their environments.

Acknowledgment

This article is adapted from the "Backup Restore Methodology using IBM Spectrum Protect for Linux on z" white paper written for IBM z Systems by Ryan Bertsche, Robert McNamara, Kyle Moser, and Dulce Smith.


Downloadable resources


Comments

Sign in or register to add and subscribe to comments.

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Linux
ArticleID=1048284
ArticleTitle=MongoDB backup and restore methodology using IBM Spectrum Protect for Linux on Power
publish-date=08082017