Cloud Pak for Security: Shutdown and Power-on Procedures

Question & Answer

Question

What are the steps for a graceful shutdown and power-on of Cloud Pak for Security (CP4S)?

Cause

Not following these instructions increases the probability of critical issues bringing the Cluster back online.

Typical reasons for requiring a complete Cluster Shutdown include but are not limited by:

Server maintenance
Planned power outage or unplanned battery backup enabled
Work that requires the server hosts to be powered off

Answer

If you are not willing to restore the Cluster, do not shut it down.

Shutdown Procedures

Log in to the Red Hatⁱ OpenShiftⁱ Cluster:
```
oc login -u ADMIN https://CONSOLE:PORT
```
NOTE: Replace ADMIN with your admin user, and replace CONSOLE and PORT with your server-specific information.
Verify nodes are in Ready status:
```
oc get nodes
```
Validate all Cluster Operators are all available True and degraded False:
```
oc get clusteroperators
```
Check etcd for the controller nodes:
```
oc get pods -n openshift-etcd | grep -v etcd-quorum-guard | grep etcd
```
NOTE: Validate that all of the etcd pods are 3/3 and running.
Select the first etc node from previous step:
```
oc rsh -n openshift-etcd etcd-cp4s-lab-control-1
```
NOTE: replace openshift-etcd and etcd-cp4s-lab-control-1 with the values appropriate for your environment.
Validate that etcd is in sync and there are no issues:
```
etcdctl member list -w table
```
- Make an etcd backup:
  a) Make note of the first node with role controller:
```
oc get nodes
```
  b) Replace NODE_NAME with previous steps noted name:
```
oc debug node/NODE_NAME
```
  c) Obtain node bash prompt:
```
chroot /host
```
  d) Now run the backup script to start the backup:
```
/usr/local/bin/cluster-backup.sh LOCATION
```
  NOTE: LOCATION is the fully qualified path location to store the backup.
  e) Verify the return was successful with:
```
snapshot db and kube resources are successfully saved to LOCATION
```
  f) exit
  g) exit
- Make a CP4S backup:
  NOTE: Creating a backup takes at least 20 minutes.
  a) Enter the project CP4S is located or specify the -n:
```
oc get pods | grep cp4s-backup-restore
```
  b) Find the pod name from previous step and use in BACKUP-RESTORE-PODNAME, and then make a password for the rollup replacing ENCRYPTION_PASSWORD:
```
oc exec BACKUP-RESTORE-PODNAME -- /opt/bin/backup-cp4s.sh -p ENCRYPTION_PASSWORD
```
  NOTE: Make sure the command successfully completes with:
```
CP4S Backup Procedure Complete
```
  c) Copy the backup from the pod to the host:
```
oc cp BACKUP-RESTORE-PODNAME:/opt/data/backup ./backup
```
- Initiate the Graceful Shutdown of the Red Hatⁱ OpenShiftⁱ Cluster:
  a) Execute in shell:
```
eval "$(ssh-agent -s)"
```
  b) Add to SSH agent:
```
ssh-add /root/RHOCP/SSH-KEY/rhocp.key
```
  c) Get node list:
```
nodes=$(oc get nodes -o jsonpath='{.items[*].metadata.name}')
```
  d) Gracefully power down the Red Hatⁱ OpenShiftⁱ nodes controller and worker:
```
for node in ${nodes[@]}; do echo "Shutting down $node"; ssh core@$node sudo shutdown -h; done
```
  e) Verify ALL of the Red Hatⁱ OpenShiftⁱ Cluster servers are DOWN:
```
oc get nodes
```
  NOTE: Do NOT force the shutdown. Stopping the nodes can take time.

Startup Procedures

NOTE: A prerequisite is that Red Hatⁱ OpenShiftⁱ Cluster was previously gracefully shut down.

- Start the supporting Red Hatⁱ OpenShiftⁱ resources in the following order:
  
  a) Gateway: DHCP
  b) Bastion Host
  c) Service Host: DNS, haproxy
  d) NFS Storage: nfsd

Verify all resources are working correctly:

a) On services host:

systemctl status named
systemctl status haproxy

b) On storage host:

exportfs
systemctl status nfs-server.service

Start the Control servers
Once the command prompt is available on each control server proceed
Start the worker|processor servers
Once the command prompt is available on each processor server proceed
- Verify all servers are working correctly:
  a) Login to the Red Hatⁱ OpenShiftⁱ Cluster:
```
oc login -u ADMIN https://CONSOLE:PORT
```
  NOTE: Replace ADMIN with your admin user, and replace CONSOLE and PORT with your server-specific information.
  b) Verify nodes are in Ready status:
```
oc get nodes
```
  c) Validate all Cluster Operators are all available True and degraded False:
```
oc get clusteroperators
```
  d) Check etcd for the Control nodes:
```
oc get pods -n openshift-etcd | grep -v etcd-quorum-guard | grep etcd
```
  NOTE: Validate that all of the etcd pods are 3/3 and running. e) Select the first etc node from previous step:
```
oc rsh -n openshift-etcd etcd-cp4s-lab-control-1
```
  NOTE: Change openshift-etcd and etcd-cp4s-lab-control-1 to the values appropriate for your environment.
  f) Validate that etcd is in sync and there are no issues:
```
etcdctl member list -w table
```
  g) exit
  h) exit
- If there are certificates pending, validate:
  a) Check for new certificates:
```
oc get csr
```
  b) Verify the certificate is valid
```
oc describe csr CSR_NAME
```
  NOTE: CSR_NAME is replaced with the name of the certificate.
  c) If the certificate is valid, approve it:
```
oc adm certificate approve CSR_NAME
```
  NOTE: CSR_NAME is replaced with the name of the certificate.
Log in to the Red Hatⁱ OpenShiftⁱ Admin UI
NOTE: You might need to clear browser cache for correct functionality.
Verify all of the Pods are started
NOTE: If a pod is misbehaving, delete that pod and give more time.

Red Hat®, JBoss®, OpenShift®, Fedora®, Hibernate®, Ansible®, CloudForms®, RHCA®, RHCE®, RHCSA®, Ceph®, and Gluster® are trademarks or registered trademarks of Red Hat, Inc. or its subsidiaries in the United States and other countries.

[{"Type":"MASTER","Line of Business":{"code":"LOB24","label":"Security Software"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSTDPP","label":"IBM Cloud Pak for Security"},"ARM Category":[{"code":"a8m3p0000000rbnAAA","label":"Administration Task"}],"ARM Case Number":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"1.7.0;1.8.0;and future releases"}]

Tips

Cloud Pak for Security: Shutdown and Power-on Procedures

Question & Answer

Question

Cause

Answer

Shutdown Procedures

Startup Procedures

Was this topic helpful?

Document Information

UID

Share your feedback

Need support?