IBM Support

Cloud Pak for Security: Shutdown and Power-on Procedures

Question & Answer


Question

What are the steps for a graceful shutdown and power-on of Cloud Pak for Security (CP4S)?

Cause

Not following these instructions increases the probability of critical issues bringing the Cluster back online.
Typical reasons for requiring a complete Cluster Shutdown include but are not limited by:
  • Server maintenance
  • Planned power outage or unplanned battery backup enabled
  • Work that requires the server hosts to be powered off

Answer

If you are not willing to restore the Cluster, do not shut it down.

Shutdown Procedures

  1. Log in to the Red Hati OpenShifti Cluster:
    oc login -u ADMIN https://CONSOLE:PORT
    NOTE: Replace ADMIN with your admin user, and replace CONSOLE and PORT with your server-specific information.
  2. Verify nodes are in Ready status:
    oc get nodes
  3. Validate all Cluster Operators are all available True and degraded False:
  4. oc get clusteroperators
  5. Check etcd for the controller nodes:
    oc get pods -n openshift-etcd | grep -v etcd-quorum-guard | grep etcd
    NOTE: Validate that all of the etcd pods are 3/3 and running.
  6. Select the first etc node from previous step:
    oc rsh -n openshift-etcd etcd-cp4s-lab-control-1
    NOTE: replace openshift-etcd and etcd-cp4s-lab-control-1 with the values appropriate for your environment.
  7. Validate that etcd is in sync and there are no issues:
    etcdctl member list -w table
    • Make an etcd backup:

      a) Make note of the first node with role controller:

      oc get nodes

      b) Replace NODE_NAME with previous steps noted name:

      oc debug node/NODE_NAME

      c) Obtain node bash prompt:

      chroot /host

      d) Now run the backup script to start the backup:

      /usr/local/bin/cluster-backup.sh LOCATION
      NOTE: LOCATION is the fully qualified path location to store the backup.
      e) Verify the return was successful with:
      snapshot db and kube resources are successfully saved to LOCATION
      f) exit
      g) exit

    • Make a CP4S backup:

      NOTE: Creating a backup takes at least 20 minutes.
      a) Enter the project CP4S is located or specify the -n:

      oc get pods | grep cp4s-backup-restore
      b) Find the pod name from previous step and use in BACKUP-RESTORE-PODNAME, and then make a password for the rollup replacing ENCRYPTION_PASSWORD:
      oc exec BACKUP-RESTORE-PODNAME -- /opt/bin/backup-cp4s.sh -p ENCRYPTION_PASSWORD
      NOTE: Make sure the command successfully completes with:
      CP4S Backup Procedure Complete
      c) Copy the backup from the pod to the host:
      oc cp BACKUP-RESTORE-PODNAME:/opt/data/backup ./backup

    • Initiate the Graceful Shutdown of the Red Hati OpenShifti Cluster:

      a) Execute in shell:

      eval "$(ssh-agent -s)"
      b) Add to SSH agent:
      ssh-add /root/RHOCP/SSH-KEY/rhocp.key
      c) Get node list:
      nodes=$(oc get nodes -o jsonpath='{.items[*].metadata.name}')
      d) Gracefully power down the Red Hati OpenShifti nodes controller and worker:
      for node in ${nodes[@]}; do echo "Shutting down $node"; ssh core@$node sudo shutdown -h; done
      e) Verify ALL of the Red Hati OpenShifti Cluster servers are DOWN:
      oc get nodes
      NOTE: Do NOT force the shutdown. Stopping the nodes can take time.

Startup Procedures

NOTE: A prerequisite is that Red Hati OpenShifti Cluster was previously gracefully shut down.

    • Start the supporting Red Hati OpenShifti resources in the following order:

      a) Gateway: DHCP
      b) Bastion Host
      c) Service Host: DNS, haproxy
      d) NFS Storage: nfsd

    • Verify all resources are working correctly:

      a) On services host:

      systemctl status named
      systemctl status haproxy
      b) On storage host:
      exportfs
      systemctl status nfs-server.service

  1. Start the Control servers
  2. Once the command prompt is available on each control server proceed
  3. Start the worker|processor servers
  4. Once the command prompt is available on each processor server proceed
    • Verify all servers are working correctly:

      a) Login to the Red Hati OpenShifti Cluster:

      oc login -u ADMIN https://CONSOLE:PORT
      NOTE: Replace ADMIN with your admin user, and replace CONSOLE and PORT with your server-specific information.
      b) Verify nodes are in Ready status:
      oc get nodes
      c) Validate all Cluster Operators are all available True and degraded False:
      oc get clusteroperators
      d) Check etcd for the Control nodes:
      oc get pods -n openshift-etcd | grep -v etcd-quorum-guard | grep etcd
      NOTE: Validate that all of the etcd pods are 3/3 and running. e) Select the first etc node from previous step:
      oc rsh -n openshift-etcd etcd-cp4s-lab-control-1
      NOTE: Change openshift-etcd and etcd-cp4s-lab-control-1 to the values appropriate for your environment.
      f) Validate that etcd is in sync and there are no issues:
      etcdctl member list -w table
      g) exit
      h) exit

    • If there are certificates pending, validate:

      a) Check for new certificates:

      oc get csr
      b) Verify the certificate is valid
      oc describe csr CSR_NAME
      NOTE: CSR_NAME is replaced with the name of the certificate.
      c) If the certificate is valid, approve it:
      oc adm certificate approve CSR_NAME
      NOTE: CSR_NAME is replaced with the name of the certificate.

  5. Log in to the Red Hati OpenShifti Admin UI
    NOTE: You might need to clear browser cache for correct functionality.
  6. Verify all of the Pods are started
    NOTE: If a pod is misbehaving, delete that pod and give more time.
  1. Red Hat®, JBoss®, OpenShift®, Fedora®, Hibernate®, Ansible®, CloudForms®, RHCA®, RHCE®, RHCSA®, Ceph®, and Gluster® are trademarks or registered trademarks of Red Hat, Inc. or its subsidiaries in the United States and other countries.

[{"Type":"MASTER","Line of Business":{"code":"LOB24","label":"Security Software"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSTDPP","label":"IBM Cloud Pak for Security"},"ARM Category":[{"code":"a8m3p0000000rbnAAA","label":"Administration Task"}],"ARM Case Number":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"1.7.0;1.8.0;and future releases"}]

Document Information

Modified date:
26 July 2023

UID

ibm16853549