Upgrading to version 1.0.9.0

Upgrade to version 1.0.9.0 is performed by IBM Support.

Important: Upgrade automatically backs up the components callhome, hpi, network, usermgmt and magneto before proceeding to 1.0.9.0 upgrade (the default backup location is /opt/ibm/appliance/storage/platform/upgrade_backup/).
The system upgrade timings are limited to three levels:
  • Base
  • Base + 2
  • Base + 6
Table 1. Cloud Pak for Data System upgrade time
Upgrade Dell/Lenovo Approximate upgrade time
1.0.8.3 to 1.0.9.0 Dell Base 6 hours
1.0.8.4 to 1.0.9.0 Lenovo Base 7 hours
1.0.8.5 to 1.0.9.0 Lenovo Base 7 hours
1.0.8.3 IF1 to 1.0.9.0 Dell Base 6 hours
1.0.8.4 IF1 to 1.0.9.0 Lenovo Base 7 hours
1.0.8.5 IF1 to 1.0.9.0 Lenovo Base 7 hours

Before you begin

Upgrade prerequisites:
  1. Upgrade to Netezza Performance Server 11.2.1.12.
    Note: The supported NPS versions are:
    • Minimum 11.2.1.10
    • Recommended 11.2.1.12
  2. Reboot the nodes before the upgrade to prevent potential issues that are related to Docker, Magento, or disks. For more details, contact IBM Support.
  3. Do the healthcheck of the system. For more details, see CPDS IIAS System healthcheck tool.
  4. If the /boot partition is less than 250 MB on any non-SPU node, the 1.0.9.0 RHEL upgrade fails. Contact IBM support for assistance.
  5. Before upgrading, make sure that no third-party tools are configured or installed. If any, uninstall them before the upgrade.
  6. Custom changes to the software appliance are not supported. It is advised to backup any custom changes made to the appliance before upgrading.
  7. Before upgrading, take a full backup of the Netezza database.
  8. Before you start the upgrade, go to /opt/ibm/appliance/platform/apos-comms/customer_network_config/ansible directory and you must run:
    ANSIBLE_HASH_BEHAVIOUR=merge ansible-playbook -i ./System_Name.yml playbooks/house_config.yml --check -v
    if any changes are listed in --check --v, ensure that they are expected. If they are unexpected, you must edit the YAML file so that it contains only the expected changes. You might rerun this command as necessary until you see no errors.
  9. Run the following commands to stop the replication on both the nodes before the upgrade (applicable only for NRS users):
    nzdr  replication stop --node-name <LOCAL_NODE>
    nzdr  replication stop --node-name <REMOTE_NODE>

    For more details, see Starting and stopping replication on a given node.

About this task

Upgrade to 1.0.9.0 is supported for systems on 1.0.8.3 and later. Only the system bundle is upgraded in 1.0.9.0.

Procedure

  1. Connect to the node1 management interface by using the custom_hostname or ip values from your System_Name.yml
  2. Check for the hub node by running ap node -d and ensuring that the entry for e1n1 has YES for both Is Master and Is HUB.
    ap node -d
    Output:
    [root@e1n1 ~]# ap node -d
    +------------------+---------+-------------+-----------+-----------+--------+---------------+---------------+
    | Node             |   State | Personality | Monitored | Is Master | Is HUB | Is VDB Master | Is NRS Master |
    +------------------+---------+-------------+-----------+-----------+--------+---------------+---------------+
    | enclosure1.node1 | ENABLED | CONTROL     |       YES |       YES |    YES |            NO |            NO |

    If either Is Master or Is HUB read NO, the master/hub must be moved to e1n1 before the upgrade can begin. If e1n1 is not the master/hub, 1.0.9.0 upgrade will error out with a message to that effect.

  3. Download the system bundle from Fix Central and copy it to /localrepo on e1n1.
    Note: The upgrade bundle requires a significant amount of free space. Make sure that you delete all bundle files from previous releases.
  4. From the /localrepo directory on e1n1, run the following command:
    mkdir 1.0.9.0_release

    Move the system bundle into the directory. The directory that is used here must have a unique name. Make sure that none of the earlier system upgrades have been run from the directory with the same name.

  5. Run upgrade details to view details about the specific upgrade version and to extract the 1.0.9.0 release bundle:
    apupgrade --upgrade-details --upgrade-directory /localrepo --use-version 1.0.9.0_release --bundle system
  6. Run the following command to install the 1.0.9.0 upgrade bundle:
    apupgrade --upgrade-apupgrade --upgrade-directory /localrepo --use-version 1.0.9.0_release --bundle system
  7. Before you start the upgrade process, depending on your requirements:
    • Run the preliminary checks with --preliminary-check option:
      apupgrade --use-version 1.0.9.0_release --upgrade-directory /localrepo --preliminary-check --bundle system
      if you want to check for potential issues and cannot accept any system disruptions. This check is noninvasive and you can rerun it as necessary. You can expect the following output after you run the preliminary checks.
      All preliminary checks complete
      Finished running pre-checks.
    • Optional: Run the preliminary checks with --preliminary-check-with-fixes option:
      apupgrade --preliminary-check-with-fixes --upgrade-directory /localrepo --use-version 1.0.9.0_release --bundle system
      if you want to check for potential issues and attempt to automatically fix those. Run it if you can accept your system to be disrupted as this command might cause the nodes to reboot.
  8. Run:
    apupgrade --upgrade --upgrade-directory /localrepo --use-version 1.0.9.0_release --bundle system

Post upgrade steps

Procedure

  1. After you finish your 1.0.9.0 upgrade process, install the Netezza Performance Server web console for Cloud Pak for Data System 1.0.7.8 and later versions. For more details, see Installing the Netezza Performance Server web console for Cloud Pak for Data System 1.0.7.8 and later or outside the system.
  2. Validate installed software versions.

    The ap version -s shows all upgraded software versions:

    ap version -s
    Appliance software version is 1.0.9.0
    
    All component versions are synchronized.
    
    +-----------------------------+----------------------------------------------------------------+
    | Component Name              | Version                                                        |
    +-----------------------------+----------------------------------------------------------------+
    | Appliance platform software | 1.0.9.0-20250123121428b5364                                    |
    | aposcomms                   | ibm-apos-dhcpd-config               : 5.2.0-1                  |
    |                             | ibm-apos-udev-rules-config          : 3.1.1-1                  |
    |                             | ibm-apos-network-config             : 7.3.0-1                  |
    |                             | ibm-apos-chrony-config              : 5.0.1-1                  |
    |                             | ibm-apos-network-tools              : 27.7.4-1                 |
    |                             | ibm-apos-common                     : 11.3.0-1                 |
    |                             | ibm-apos-named-config               : 3.5.0-1                  |
    | apupgrade                   | 1.0.9.0-20250123051657b5342                                    |
    | callhome                    | 1.2.0.0-20240426043141b3                                       |
    | containerapi                | 1.0.23.0-20240327162942b14960                                  |
    | cyclops                     | 0.1.0.0                                                        |
    | gpfs                        | 5.1.9-4                                                        |
    | gpfsconfig                  | 1.0.9.0-20250123073041b5346                                    |
    | hpi                         | hpi-cumulus-switch-firmware         : 2.0.0.1-20250117114512   |
    |                             | hpi-cumulus-fabspine-firmware       : 2.0.0.1-20250117114512   |
    |                             | hpi-software                        : 1.0.9.0-20250123050507b4 |
    |                             | hpi-dell-node-firmware              : 1.8.0.1-20250117114512   |
    |                             | hpi-lenovo-node-firmware            : 1.8.0.2-20250117114512   |
    |                             | hpi-cumulus-fabsw-firmware          : 2.0.0.1-20250117114512   |
    |                             | hpiutils                            : 2.0.4.6-20240327201504b1 |
    |                             | hpi-x86_64-image                    : 2.0.4.6-20240327201659b1 |
    |                             | hpi-cumulus-mgtsw-firmware          : 2.0.0.1-20250117114512   |
    |                             | hpicfg                              : 2.0.4.6-20240327201442b1 |
    |                             | dct                                 : 2.0.4.6-20230626154915   |
    | magneto                     | 1.0.29.2-20250122103107b5278                                   |
    | mellanox                    | 24.10.0.0                                                      |
    | mvcli                       | 2.3.10.1095                                                    |
    | nodeos                      | 1.0.9.0-20250123070924b5346                                    |
    | platformbackups             | 1.0.20.0-20240327163319b14962                                  |
    | psklm                       | 1.0.24.0-20240328063759b2                                      |
    | solarflare                  | 5.13.14.1019                                                   |
    | supporttools                | 1.0.23.12-20250122094147b5294                                  |
    +-----------------------------+----------------------------------------------------------------+
  3. Run the following commands to validate the network health checks:
    • /opt/ibm/appliance/platform/apos-comms/tools/aposTimeCheck.py
      Output:
      [root@gt19-node1 ~]#  /opt/ibm/appliance/platform/apos-comms/tools/aposTimeCheck.py
      Validating /opt/ibm/appliance/platform/apos-comms/customer_network_config/ansible/gt19-house.yml
      No upstream time source
      Time offset healthy. Value:0.000000000
    • /opt/ibm/appliance/platform/apos-comms/tools/aposDnsCheck.py
      Output:
      [root@gt19-node1 ~]#  /opt/ibm/appliance/platform/apos-comms/tools/aposDnsCheck.py
      Validating /opt/ibm/appliance/platform/apos-comms/customer_network_config/ansible/gt19-house.yml
      Checking hostname
      [RUNNING] - Trying to query base FQDN
      [PASS] - The base FQDN was resolvable
      [PASS] - The query succeeded
    • [root@e1n1 tools]# /opt/ibm/appliance/platform/apos-comms/tools/aposNetworkCheck.py
      Output:
      [root@e1n1 tools]# /opt/ibm/appliance/platform/apos-comms/tools/aposNetworkCheck.py
      Validating /opt/ibm/appliance/platform/apos-comms/customer_network_config/ansible/system-house.yml
      SUMMARY OF ERRORS:
      [root@e1n1 tools]#
      

    If you find any errors in the above validation scripts. Contact IBM support.

  4. Determine GPFS state
    The mmgetstate -a shows the following output:
    [root@e1n1 apupgrade]#  mmgetstate -a
    
     Node number  Node name  GPFS state
    -------------------------------------
               1  e1n1       active
               2  e1n2       active
               3  e1n3       active
  5. Verify platform management state
    The ap state -d shows the following output:
    [root@e1n1 apupgrade]# ap state -d
    System state is 'Ready'
    Application state is 'Ready'
    Platform management state is 'Active' 
  6. After a successful upgrade, run the following commands to start replication on both nodes (applicable only for NRS users):
    nzdr  replication start --node-name <LOCAL_NODE>
    nzdr  replication start --node-name <REMOTE_NODE>
  7. Run the following command to verify all the nodes and databases are in healthy condition (applicable only for NRS users):
    # nzdr status --details
    Example output of all the nodes are in healthy condition:
    Nodes -
    +-------------+-------------+----------------------+--------------+----------+---------------+
    | node-name   | reachable   | replication-status   | nps-status   | health   | nps-version   |
    +=============+=============+======================+==============+==========+===============+
    | CPDS-A      | Yes         | Active               | Online       | Healthy  | 11.2.3.4      |
    +-------------+-------------+----------------------+--------------+----------+---------------+
    | CPDS-B      | Yes         | Active               | Online       | Healthy  | 11.2.3.4      |
    +-------------+-------------+----------------------+--------------+----------+---------------+ 
    
  8. Validate node personalities
    The ap node -d shows the following output:
    [root@e1n1 apupgrade]# ap node -d
    +------------------+---------+-------------+-----------+-----------+--------+---------------+---------------+
    | Node             |   State | Personality | Monitored | Is Master | Is HUB | Is VDB Master | Is NRS Master |
    +------------------+---------+-------------+-----------+-----------+--------+---------------+---------------+
    | enclosure1.node1 | ENABLED | CONTROL     |       YES |       YES |    YES |           YES |            NO |
    | enclosure1.node2 | ENABLED | CONTROL     |       YES |        NO |     NO |            NO |            NO |
    | enclosure1.node3 | ENABLED | CONTROL     |       YES |        NO |     NO |            NO |            NO |
    | enclosure1.node4 | ENABLED | UNSET       |       YES |        NO |     NO |            NO |            NO |
    | enclosure2.node1 | ENABLED | UNSET       |       YES |        NO |     NO |            NO |            NO |
    | enclosure2.node2 | ENABLED | UNSET       |       YES |        NO |     NO |            NO |            NO |
    +------------------+---------+-------------+-----------+-----------+--------+---------------+---------------+
    
    IPS nodes
    +------------------+-----------+---------------+-----------+------------+----------+
    | Node             |     State | Personality   | Monitored | IPS Status | IPS Role |
    +------------------+-----------+---------------+-----------+------------+----------+
    | enclosure2.node3 | UNMANAGED | VDB[IPS1NODE] |        NO |         OK |   Active |
    | enclosure2.node4 | UNMANAGED | VDB[IPS1NODE] |        NO |         OK |   Active |
    +------------------+-----------+---------------+-----------+------------+----------+
  9. Review open issues
    The ap issues shows the following output:
    [root@e1n1 apupgrade]# ap issues
    No open alerts (issues) and unacknowledged events
    
    Generated: 2025-02-15 17:57:14
    
    [root@e1n1 apupgrade]#