You can back up (and later restore) existing topology data for the Agile Service Manager
OCP installation. This can be helpful when updating your system, as part of your company's
data management best-practice, or for maintenance reasons.
About this task
Agile Service Manager stores its information in the Cassandra
database, using in turn PostgreSQL for providing inventory-like search and query capabilities of the
same data. The backup and restore procedures are therefore based on backing up the Cassandra
database (where all the information is stored) and, upon its restoration, triggering a topology
re-broadcast to re-populate PostgreSQL to keep the consistency of the information in both
stores.
Agile Service Manager can be installed on RedHat OpenShift Container Platform (OCP) following
different deployment models:
- As part of a Netcool Operations Insight solution
- Or as a stand-alone product (that is, without NOI) as part of a Watson AIOps deployment
- Naming conventions for Agile Service Manager on OCP (as part of NOI or AIOps Event Manager)
- When Agile Service Manager is deployed as part of NOI, the naming convention the Agile Service
Manager pods follow is:
{releaseName}-topology-{resource}-{suffix}
, where 'suffix'
is a number for the Stateful Sets (such as a Cassandra database) that represents the pod replica
number, or a hash-like uid for the Deployment pods.
- For example, Cassandra pods might be
{releaseName}-topology-cassandra-0
for the
first replica of the set, whereas the topology service pod could be
{releaseName}-topology-topology-57477f4978-qp9gz
.
- In cases where Cassandra is shared, the shared instance pods Agile Service Manager uses would be
{releaseName}-cassandra-{suffix}
instead.
- Naming conventions for Agile Service Manager on OCP standalone (pre-reqs for AIOps AI
Manager)
- When Agile Service Manager is installed as a standalone product, the naming convention for its
pods is
{releaseName}-{resource}-{suffix}
.
When deployed as part of Netcool Operations Insight, and depending on whether you are installing
for the first time or upgrading, you can share the Cassandra database instances between all the NOI
products, or deploy separate instances for Agile Service Manager and NOI. When sharing the Cassandra
instances, the database will contain data from the rest of the components in NOI that make use of
Cassandra, as well as the Agile Service Manager data.
Assumption: The backup and restore
procedures in these topics assume a standard Agile Service Manager NOI deployment (and
not an
standalone deployment), a shared use of Cassandra, and that the release name used is 'noi'. Adjust
the samples provided to your circumstances.
- Backup
- The backup procedure documented here performs a backup of all the keyspaces in the
Cassandra database, including those not specific to Agile Service Manager.
- Restore
- The restore procedures focus on restoring only the keyspace that is relevant to Agile
Service Manager (that is, 'janusgraph').
Procedure
Preparing your system for backup
-
Authenticate into the Kubernetes namespace where Agile Service Manager is deployed as part of
your solution.
-
Deploy the following kPodLoop bash shell function.
kPodLoop is a bash shell function that allows a command to be run against all the
matching Kubernetes containers. You can copy it into the shell.
kPodLoop() {
__podPattern=$1
__podCommand=$2
__podList=$( kubectl get pods --field-selector=status.phase=Running --no-headers=true --output=custom-columns=NAME:.metadata.name | grep ${__podPattern} )
printf "Pods found: $(echo -n ${__podList})\n"
for pod in ${__podList}; do
printf "\n===== EXECUTING COMMAND in pod: %-42s =====\n" ${pod}
kubectl exec ${pod} -- bash -c "${__podCommand}"
printf '_%.0s' {1..80}
printf "\n"
done;
}
This
kPodLoop bash shell function filters the pods to run the commands against only those that are in a
'Running' phase. This filter ensures that the configuration pods that are only run as part of your
installation, like the secret generator pod, are skipped.
-
Make a note of the scaling of Agile Service Manager pods.
kubectl get pods --field-selector=status.phase=Running --no-headers=true --output=custom-columns=CNAME:.metadata.ownerReferences[0].name | grep topology | uniq --count
Example
output:1 noi-topology-dns-observer
1 noi-topology-docker-observer
3 noi-topology-elasticsearch
1 noi-topology-file-observer
1 noi-topology-kubernetes-observer
1 noi-topology-layout
1 noi-topology-merge
1 noi-topology-noi-gateway
1 noi-topology-noi-probe
1 noi-topology-observer-service
1 noi-topology-search
1 noi-topology-status
1 noi-topology-topology
1 noi-topology-ui-api
-
Verify access to each Cassandra database (this command will return a list of keyspaces from
each Cassandra node). Adjust the Cassandra pod names based on your deployment model naming convention.
kPodLoop noi-cassandra "cqlsh -u \${CASSANDRA_USER} -p \${CASSANDRA_PASS} -e \"DESC KEYSPACES;\""
-
Scale down Agile Service Manager pods.
kubectl scale deployment --replicas=0 noi-topology-dns-observer
kubectl scale deployment --replicas=0 noi-topology-file-observer
kubectl scale deployment --replicas=0 noi-topology-kubernetes-observer
kubectl scale deployment --replicas=0 noi-topology-observer-service
kubectl scale deployment --replicas=0 noi-topology-noi-gateway
kubectl scale deployment --replicas=0 noi-topology-noi-probe
kubectl scale deployment --replicas=0 noi-topology-layout
kubectl scale deployment --replicas=0 noi-topology-merge
kubectl scale deployment --replicas=0 noi-topology-status
kubectl scale deployment --replicas=0 noi-topology-search
kubectl scale deployment --replicas=0 noi-topology-ui-api
kubectl scale deployment --replicas=0 noi-topology-topology
The
Cassandra and Elasticsearch pods (noi-cassandra and noi-topology-elasticsearch) are left active.
Cassandra pods need to be running in order to execute the backup of their data, whereas the
ElasticSearch pods have no interaction with nor influence on the Cassandra contents, so can be kept
running.
Important: Include in this scale down any additional observers in your
deployment.
-
Verify that scaling down was successful.
kubectl get pods --field-selector=status.phase=Running | grep noi-topology
The Agile Service Manager services have now been scaled down, and the Cassandra database
contents will not be modified anymore.
Backing up data
-
Deploy the pbkc bash shell function.
The
pbkc function attempts to backup the Cassandra database on all nodes as close to
simultaneously as possible. You can copy it into the shell. Modify the following example script and
change
RELEASE
to conform to your installation's Cassandra pod names based on your
deployment model
naming
conventions.
pbkc() {
## Parallel Backup of Kubernetes Cassandra
RELEASE=noi
DATE=$( date +"%F-%H-%M-%S" )
LOGFILEBASE=/tmp/clusteredCassandraBackup-${DATE}-
declare -A LOG
## get the current list of cassandra pods.
podlist=$( oc get pods --field-selector=status.phase=Running --no-headers=true --output=custom-columns=NAME:.metadata.name | grep ${RELEASE}-cassandra )
for pod in ${podlist}; do
echo -e "BACKING UP CASSANDRA IN POD ${pod}"
oc exec ${pod} -- bash -c "/opt/ibm/backup_scripts/backup_cassandra.sh -u \${CASSANDRA_USER} -p \${CASSANDRA_PASS} -f > /dev/null 2> /dev/null &"
done
printf "Waiting for backups to complete:"
COMPLETE="false"
while [ $COMPLETE = "false" ]; do
COMPLETE="true"
for pod in ${podlist}; do
log=`oc exec $pod -- bash -c "ps -ef | grep backup_cassandra.sh- | grep -v grep | awk '{print \\\$NF}'"`
if [ $? != 0 ]; then
COMPLETE="false"
elif [ -n "$log" ]; then
COMPLETE="false"
LOG[$pod]=$log
fi
done
printf "-"
if [ $COMPLETE = "false" ]; then
sleep 30
fi
done
printf "\n"
for pod in ${podlist}; do
log=${LOGFILEBASE}${pod}.log
oc cp ${pod}:${LOG[$pod]} ${log}
echo -e "Backup of ${pod} completed, please verify via log file ($log)"
done
}
-
Run a clean-up on all keyspaces in all Cassandra instances.
Example Cassandra keyspaces
cleanup:
kPodLoop noi-cassandra "nodetool cleanup system_schema"
kPodLoop noi-cassandra "nodetool cleanup system"
kPodLoop noi-cassandra "nodetool cleanup system_distributed"
kPodLoop noi-cassandra "nodetool cleanup system_auth"
kPodLoop noi-cassandra "nodetool cleanup janusgraph"
kPodLoop noi-cassandra "nodetool cleanup system_traces"
-
Run backup on all Cassandra instances (using the pbkc shell function just deployed).
-
Check the final output in the log file for each backup.
Adjust the date in the grep command as appropriate.
grep "BACKUP DONE SUCCESSFULLY" /tmp/clusteredCassandraBackup-2019-06-14-14-09-50*
/tmp/clusteredCassandraBackup-2019-06-14-14-09-50-noi-cassandra-0.log:Fri Jun 14 14:11:04 UTC 2019 BACKUP DONE SUCCESSFULLY !!!
/tmp/clusteredCassandraBackup-2019-06-14-14-09-50-noi-cassandra-1.log:Fri Jun 14 14:11:16 UTC 2019 BACKUP DONE SUCCESSFULLY !!!
/tmp/clusteredCassandraBackup-2019-06-14-14-09-50-noi-cassandra-2.log:Fri Jun 14 14:11:16 UTC 2019 BACKUP DONE SUCCESSFULLY !!!
Tip: For additional information about backing up secrets and the
system_auth
keyspace, see the
Secrets and system_auth keyspace note in the 'Restoring database data (OCP)'
topic.
When backup has successfully completed, you restore your Agile Service Manager services
to normal operation.
Restore services
-
Scale up the services to the original level.
The original level was obtained in a
previous
step.
kubectl scale deployment --replicas=1 noi-topology-topology
kubectl scale deployment --replicas=1 noi-topology-layout
kubectl scale deployment --replicas=1 noi-topology-merge
kubectl scale deployment --replicas=1 noi-topology-status
kubectl scale deployment --replicas=1 noi-topology-search
kubectl scale deployment --replicas=1 noi-topology-observer-service
kubectl scale deployment --replicas=1 noi-topology-noi-gateway
kubectl scale deployment --replicas=1 noi-topology-noi-probe
kubectl scale deployment --replicas=1 noi-topology-ui-api
kubectl scale deployment --replicas=1 noi-topology-dns-observer
kubectl scale deployment --replicas=1 noi-topology-file-observer
kubectl scale deployment --replicas=1 noi-topology-rest-observer
kubectl scale deployment --replicas=1 noi-topology-kubernetes-observer
Results
The backup procedure stores the backup generated files inside the Agile Service Manager
Cassandra pods inside the /opt/ibm/cassandra/data/backup_tar/
directory.
What to do next
You can restore your backed up data as and when required.