Planning for maintenance activities
Cloud services has five key maintenance activities that need to take place. Some are for keeping normal day-to-day operations going and some are contingencies for service restoration in case of a disaster.
- Backing up the key library manager - a new copy needs to be made every time there is any change
to a key in the key manager. For more information, see Backing up the Cloud services configuration.
Note: If the key manager is lost, there is no backdoor or secret IBM® internally known way to recover the data from cloud storage. So, it is important to have a backup copy of the key manager.
- Cloud services leverages SOBAR as a backup mechanism of Transparent cloud tiering metadata that can be used to restore the Transparent cloud tiering service in case of a failure. A sample script is provided in the Transparent cloud tiering directory that can be deployed to run the backups. For more information, see Scale out backup and restore (SOBAR) for Cloud services.
- Background removal of deleted files from the object storage. This is recommended to be done daily.
- Backing up the Cloud services full database to the cloud. This is recommended to be done weekly.
- Reconciling the Transparent cloud tiering database. This is recommended to be done every four weeks.
- Reconcile is by far the longest activity and the main one to consider when you plan your service windows. A reconcile for every 100 million file container (which is the default spillover value) takes a few hours if you run metadata with flash. If you run it from a disk, it takes more like 6 to 12 hours.
- Each Transparent cloud tiering node can run one service activity at a time during the maintenance window. For example, if you have three Transparent cloud tiering nodes, you might be running three maintenance activities in parallel at the same time.
- Keep in mind that maintenance activities run to completion even after the maintenance window has passed. The longest duration outside the maintenance window would be an activity that was scheduled and started just prior to the maintenance window closing. That activity would run almost entirely after the maintenance window had completed.
Here is an explanation of how the maintenance window size affects the ability to scale number of Transparent cloud tiering files. If you have a weekly maintenance window of 24 hours, each node would be able to process two-to-three 100 million file containers per week. Since reconcile maintenance needs to be done every 4 weeks, it would follow that a 24-hour maintenance window can accommodate eight-to-twelve 100 million file containers per node every week. With a full 4-node setup, this adds up to 32 - 48 containers (roughly, that is support for 3-5 billion files) per 24-hour maintenance window for a single node group.
It follows that if you want to use Transparent cloud tiering with more than this, you are likely going to want to consider putting the IBM Spectrum Scale metadata and the Transparent cloud tiering database in flash storage. This will greatly increase the number of files the maintenance window can handle.
For setting up a maintenance activity, see Configuring the maintenance windows.