Upgrading to version 1.0.7.3

Follow this procedure to upgrade Cloud Pak for Data System to version 1.0.7.3.

Before you begin

Upgrade prerequisites:

Your system must be on version 1.0.4.x, 1.0.5.x or 1.0.7.x to upgrade.

Note: The recommended location to stage the VM and Services bundles is GPFS storage. You must use /opt/ibm/appliance/storage/platform/localrepo/<version> instead of /localrepo/<version>. The recommended filepath is provided in the upgrade procedure steps.

To avoid Ansible performance issue, ensure that ulimit -n is set to 1024 on all OpenShift nodes (that is, all control and worker VMs).

Network setup prerequisites:

If the system already has a custom network configuration, it must be configured via the /opt/ibm/appliance/platform/apos-comms/customer_network_config/ansible with a System_Name.yml file:
Before you upgrade, ensure that in /opt/ibm/appliance/platform/apos-comms/customer_network_config/ansible there is a System_Name.yml file specifying the house network configuration.
To locate the file, run the following command from /opt/ibm/appliance/platform/apos-comms/customer_network_config/ansible:
```
ls -t *yml | grep -v template | head -1
```
If the file does not exist, you have to create it. Otherwise your network configuration might break during the upgrade. For more information on the file and how to create it for versions older than 1.0.3.6, see the Node side network configuration section, and the following link specifically: Editing the network configuration YAML file.
If apupgrade detects custom network configuration and no yml file, it will fail at the precheck step.
If you are upgrading a new system with no network configuration, the apupgrade will not stop at the check for System_Name.yml, but will continue the upgrade process.
Before you start the upgrade, from /opt/ibm/appliance/platform/apos-comms/customer_network_config/ansible directory, you must run:
```
ANSIBLE_HASH_BEHAVIOUR=merge ansible-playbook -i ./System_Name.yml playbooks/house_config.yml --check -v
```
If any changes are listed in --check --v, ensure that they are expected. If they are unexpected, you must edit the YAML file so that it contains only the expected changes. You might rerun this command as necessary until you see no errors.

Netezza prerequisites:

If Netezza® Performance Server is installed on your system, run nzhostbackup before you upgrade. For more information, see Create a host backup.

All upgrade commands must be run with root user.

Procedure

Connect to node e1n1 via the management address and not the application address or floating address.
Verify that e1n1 is the hub:
1. Check for the hub node by verifying that the dhcpd service is running:
```
systemctl is-active dhcpd
```
2. If the dhcpd service is running on a node other than e1n1, bring the service down on that other node:
```
systemctl stop dhcpd
```
3. On e1n1, run:
```
systemctl start dhcpd
```
Download the system bundle from Fix Central and copy them to /localrepo on e1n1.

Note: The upgrade bundle requires a significant amount of free space. Make sure you delete all bundle files from previous releases.
From the /localrepo directory on e1n1, run:
```
mkdir 1.0.7.3_release
```
and move the system bundle into that directory.
The directory that is used here must be uniquely named - for example, no previous upgrades on the system can have been run out of a directory with the same name.

Optional: Run upgrade details to view details about the specific upgrade version:

apupgrade --upgrade-details --upgrade-directory /localrepo --use-version 1.0.7.3_release --bundle system

Before you start the upgrade process, depending on your requirements:
- Run the preliminary checks with --preliminary-check option:
```
apupgrade --preliminary-check --upgrade-directory /localrepo --use-version 1.0.7.3_release --bundle system
```
  if you just want to check for potential issues and cannot accept any system disruptions. This check is non-invasive and you can rerun it as necessary. You can expect the following output after you run the preliminary checks.
```
All preliminary checks complete
Finished running pre-checks.
```
- Optional: Run the preliminary checks with --preliminary-check-with-fixes option:
```
apupgrade --preliminary-check-with-fixes --upgrade-directory /localrepo --use-version 1.0.7.3_release --bundle system
```
  if you want to check for potential issues and attempt to automatically fix those. Run it if you can accept your system to be disrupted as this command might cause the nodes to reboot.

Optional: Upgrade the apupgrade command to get the new command options:

apupgrade --upgrade-apupgrade --upgrade-directory /localrepo --use-version 1.0.7.3_release --bundle system

The value for the --use-version parameter is the same as the name of the directory you created in step 4.

Note: If upgrading from version 1.0.4.x, you might hit the following no-harm error:

File "/localrepo/1.0.7.3-20200910.044642-gt01/EXTRACT/system/upgrade/bundle_upgraders/bundle_upgrade.py", line 457, in load_component
                               upgrader._upgrade_status_tracker = self.upgrade_status_trackers[node]
                           KeyError: u'e1n2'

You can ignore the error and continue with the process.

Initiate the upgrade of the system servers:
Switch and node firmware upgrade is included in the bundle. It is recommended that you upgrade all these components.
When you run the standard system upgrade command:
```
apupgrade --upgrade --upgrade-directory /localrepo --use-version 1.0.7.3_release --bundle system
```
the upgrade process attempts to quietly update firmware, Cumulus and OS on the fabric and management switch. If errors occur during that stage, a warning that the switches were unable to be updated is displayed. You can continue with the upgrade process to upgrade the remaining components. The system will be operational when you finish, but it is strongly advised that you contact IBM Support to assist in fixing the issues with the switch updates. They must be upgraded before you apply the next upgrade.
Note: Although it is recommended that you run the standard upgrade command, the following options are also available:
- --update-switches updates switches firmware but it fails the upgrade process if any errors occur;
- --skip-firmware does not update any firmware on the nodes or switches, but it still applies new configuration;
- --skip-hw-cfg does not apply new configuration on hardware.
To skip node and switch firmware upgrade, run apupgrade command with --skip-firmware --skip-hw-cfg options.

You now need to upgrade the VM cluster on the system.

Download the cpds_vm bundle from Fix Central and copy to the GPFS storage directory. Create the following directory if it does not exist:
```
mkdir -p /opt/ibm/appliance/storage/platform/localrepo/1.0.7.3_release
```
Tip: You can also create a staging folder on the PVM or a laptop, for example /tmp/1.0.7.3_staging and download the package there, and then upload it to e1n1 GPFS location with scp, for example:
```
scp icpds_vm-1.0.7.3-icpds-release.tar.gz root@e1n1:/opt/ibm/appliance/storage/platform/localrepo/1.0.7.3_release/
```

Initiate the VM upgrade by running:

apupgrade --upgrade --upgrade-directory /opt/ibm/appliance/storage/platform/localrepo --use-version 1.0.7.3_release --bundle vm

You now need to upgrade the application services (Red Hat OpenShift, Portworx, Cloud Pak for Data, Netezza Performance Server console, Cloud Pak for Data System console.

If not there, download the following bundles in the GPFS storage directory you created in step 9:
- icpds_vm
- icpds_rhos_repo
- icpds_services
- icpds_services_addon_cyclops

Initiate the services upgrade by running:

apupgrade --upgrade --upgrade-directory /opt/ibm/appliance/storage/platform/localrepo --use-version 1.0.7.3_release --bundle services --application all

Netezza Performance Server post-upgrade steps

If NPS is installed on Cloud Pak for Data System, perform the steps described in this section after you upgrade Cloud Pak for Data System.

About this task

Important: Below steps are applicable only for NPS versions below 11.0.7.0. Post NPS 11.0.7.0, these steps are NOT needed.

Procedure

Stop the platform manager:
```
apstop -p
```

Stop NPS:

docker exec -it ipshost1 bash -c "su - nz -c 'nzstop'"

Remove the GPFS token file to avoid accidental nzstart:
```
docker exec ipshost1 rm /nz/.gpfstoken
```

Log in to the e1n1 control plane node and remove the net.ipv4.conf.all.rp_filter setting from e1n1's ipshost1 container:

sysctl --system

docker exec ipshost1 bash -c "sed -i -e '/net.ipv4.conf.all.rp_filter/d' /etc/sysctl.conf"
docker exec ipshost1 bash -c "echo 'net.ipv4.conf.all.rp_filter = 0' >> /etc/sysctl.conf"
docker exec ipshost1 bash -c "echo 'net.ipv4.conf.default.rp_filter = 0' >> /etc/sysctl.conf"
docker exec ipshost1 bash -c "echo 'net.ipv4.conf.mgt1.rp_filter = 0' >> /etc/sysctl.conf"
docker stop ipshost1

Perform the following steps on the other two control nodes (e1n2 and e1n3 in case of Lenovo, e2n1 and e3n1 in case of Dell):

docker start ipshost1
docker exec ipshost1 bash -c "sed -i -e '/net.ipv4.conf.all.rp_filter/d' /etc/sysctl.conf"
docker exec ipshost1 bash -c "echo 'net.ipv4.conf.all.rp_filter = 0' >> /etc/sysctl.conf"
docker exec ipshost1 bash -c "echo 'net.ipv4.conf.default.rp_filter = 0' >> /etc/sysctl.conf"
docker exec ipshost1 bash -c "echo 'net.ipv4.conf.mgt1.rp_filter = 0' >> /etc/sysctl.conf"
docker exec ipshost1 bash -c "systemctl --system"
docker stop ipshost1

sysctl --system

Start NPS on e1n1 control node:

docker start ipshost1

docker exec -it ipshost1 bash -c "su - nz -c 'nzstart'"

Recreate the GPFS token file:

docker exec ipshost1 touch /nz/.gpfstoken

Start platform manager:
```
apstart -p
```