1.0.7.8 Interim Fix 3 release notes

1.0.7.8 Interim Fix 3 upgrades firmware to support new drives through XCC and UEFI as the old drives are nearing the end of life.

The upgraded firmware components are:

OneCLI: 3.4.0
SMM: 1.25
LXPM: PDL138G - 2.06
UEFI: TEE180H - 3.41
XCC: TEO3D2Q - 5.41
NIC_ETH_EXT: 16.27.2008
SSD_NVME_STORAGE_INTEL_SSDPE2KX040T8O: VDV10184
SSD_NVME_STORAGE_INTEL_SSDPF2KX038T1O: 9CV10320
SSD_NVME_STORAGE_SAMSUNG_MMZWLR3T8HCLS-00A07: MPPA5B5Q

Note: If upgrade fails during scp to /tmp directory as shown:

2023-11-07 04:26:22 ERROR: Error running command [scp -r e1n1:/localrepo/1.0.7.8.IF3_release/EXTRACT/system/upgrade/hpi/*.rpm /tmp/APUPGRADE/hpi.20231107042312/] on [u'e1n1']
2023-11-07 04:26:22 INFO: Cleaning up the files(and the dir) that were copied to all the nodes...
2023-11-07 04:26:23 INFO: Done
2023-11-07 04:26:23 INFO: hpi:Failed. Failed to complete HPI component upgrade.
2023-11-07 04:26:23 FATAL ERROR: Errors encountered
2023-11-07 04:26:23 FATAL ERROR:
2023-11-07 04:26:23 FATAL ERROR: HpiUpgrader.install : Fatal Problem: Could not copy files to all nodes.
2023-11-07 04:26:23 FATAL ERROR: This error requires manual intervention to resolve. Please contact IBM Support.
2023-11-07 04:26:23 FATAL ERROR:
2023-11-07 04:26:23 FATAL ERROR: More Info: See trace messages at /var/log/appliance/apupgrade/20231107/apupgrade20231107040119.log.tracelog for additional troubleshooting information.
2023-11-07 04:26:23 INFO: File /var/log/appliance/apupgrade/rest_server_pid.txt storing REST Server process ID does not exist
2023-11-07 04:26:23 INFO: REST Server - Instance is not running.
2023-11-07 04:26:23 ERROR: The following components failed to upgrade: ['hpi']
2023-11-07 04:26:23 FATAL ERROR: Unhandled error when attempting upgrade. Stack trace of failed command logged to /var/log/appliance/apupgrade/20231107/apupgrade20231107040119.log.tracelog
2023-11-07 04:26:23 FATAL ERROR: More Info: See trace messages at /var/log/appliance/apupgrade/20231107/apupgrade20231107040119.log.tracelog for additional troubleshooting information.
2023-11-07 04:26:23 INFO: File /var/log/appliance/apupgrade/rest_server_pid.txt storing REST Server process ID does not exist
2023-11-07 04:26:23 INFO: REST Server - Instance is not running.

Check if there is enough space under /tmp. If it is full, clean up the directory by using:

rm -rf /tmp/APUPGRADE/*

And restart the upgrade by using:

apupgrade --upgrade --upgrade-directory /localrepo --use-version 1.0.7.8.IF3_release --bundle system

Draft comment: arun.c.r@ibm.com
https://github.ibm.com/privatecloud-ap/cpds-issues/issues/4728

Before you begin

Download the following package 1.0.7.8.IF3-WS-ICPDS-fpXXX, where XXX stands for the latest package number, from Fix Central.
Estimated upgrade time:
- For Dell systems, the estimated upgrade time is 2 hours. Downtime of around 2 hours is required.
- For Lenovo systems, the estimated upgrade time is 4 hours 30 minutes. Downtime of around 4 hours and 30 minutes is required.
The system must be on version 1.0.7.8 to apply the fix.
If NPS has non-default admin account credentials, the following actions must be completed before you can upgrade:
1. Ensure that you have the NPS database admin user password.
2. In /export/home/nz/.bashrc file inside the container, set NZ_USER=admin and NZ_PASSWORD=<customer_password>

Procedure

Connect to node e1n1 via the management address and not the application address or floating address.
Verify that e1n1 is the hub:
1. Check for the hub node by verifying that the dhcpd service is running:
```
systemctl is-active dhcpd
```
2. If the dhcpd service is running on a node other than e1n1, bring the service down on that other node:
```
systemctl stop dhcpd
```
3. On e1n1, run:
```
systemctl start dhcpd
```
Download the icpds-release-1.0.7.8.IF3.tar.gz bundle and copy it to /localrepo on e1n1.

Note: Make sure that you delete all bundle files from previous releases.
From the /localrepo directory on e1n1, run:
```
mkdir /localrepo/1.0.7.8.IF3_release
```
And move the system bundle into that directory. The directory that is used here must be uniquely named - for example, no previous upgrades on the system can have been run out of a directory with the same name.
Verify the status of your appliance by running:
- ```
ap issues
```
- ```
ap version -s
```
- ```
ap sw
```

Optional: Run upgrade details to view details about the specific upgrade version:

apupgrade --upgrade-details --upgrade-directory /localrepo --use-version 1.0.7.8.IF3_release --bundle system

Run preliminary checks before you start the upgrade process. The preliminary check option checks for possible issues and attempts to automatically fix any known issues during pre-checks.
```
apupgrade --preliminary-check-with-fixes --upgrade-directory /localrepo --use-version 1.0.7.8.IF3_release --bundle system
```
Optional: If you have custom certificates, copy them to the following /opt/ibm/appliance/storage/platform/cyclops/ directory before you start the upgrade process.
1. Copy cert.crt to /opt/ibm/appliance/storage/platform/cyclops/cert.crt
2. Copy cert.key to /opt/ibm/appliance/storage/platform/cyclops/cert.key

Start the upgrade process:

apupgrade --upgrade --upgrade-directory /localrepo --use-version 1.0.7.8.IF3_release --bundle system

Wait for the upgrade to complete successfully.
Run:
```
ap version -s
```
And verify that the IF is listed in Interim Fixes.

Known issue: apupgrade fails at stopping Magneto Service

apgrade fails to stop Magneto Service due to some container running. The command attempts to stop out these containers and errors .

INFO: Attempt number 3 to stop Magneto service
ERROR: Exception: Traceback (most recent call last):
ERROR:   File "/localrepo/1.0.7.8.IF3/EXTRACT/upgrade/modules/ibm/ca/util/magneto_manager.py", line 264, in stop_magneto_service
ERROR:     raise Exception("Failed to stop Magneto Service, within {} tries.".format(str(count-1)))
ERROR: Exception: Failed to stop Magneto Service, within 3 tries.
ERROR:

Workaround:

Run docker ps command on the control nodes to see the running containers:

ssh e1n1
docker ps
CONTAINER ID        IMAGE                           COMMAND                  CREATED             STATUS              PORTS                    NAMES
2a19811bed99        callhome_repo:callhome.x86_64   "/usr/bin/entrypoi..."   30 hours ago        Up 29 hours                        callhome

Run docker stop to stop the running container on the respective node.
```
docker stop 2a19811bed99
```