Switch health check patch release notes

After the patch is applied, platform manager reduces status polling interval for 1 GbE management switches. The high rate of health checks was causing eUSB flash storage failures in both management switch and Fibre Channel switch. The new check interval is 90 minutes.

Before you begin

  • The patch is applicable to any Cloud Pak for Data System 1.0.x versions.
  • The patch is executed by running an interactive update_switches_delay.py script that guides you through the process.
  • The estimated run time is 5-10 minutes, depending on the system size. Platform Manager is stopped, but the applications remain online. Database remains online.
  • Before you run the script, cat the appropriate json file before running the script so you can compare with the results.
    Note: The file to check is in the following location: /usr/lib/python2.7/site-packages/magneto/cfg/gt_hub.json
Run the script as root. Either log in as root directly, or use the command su -. The su root command does not work and causes the process to fail.

Procedure

  1. Download the 1.0.0.0.switch_monitoring_policy-WS-ICPDS-fpXXX package, where XXX stands for the latest package number, from Fix Central.
  2. After the tar file is downloaded, untar the file by using tar -xvf command. update_switches_delay directory is created. Example:
    [node1 tmp]# tar -xvf switch_monitoring_policy-1.0_release.tar.gz
    update_switches_delay/
    update_switches_delay/update_switches_delay.py
  3. Run:
    [node1 tmp]# cd update_switches_delay
  4. Run the following command without any parameters.
    [node1 update_switches_delay]# python update_switches_delay.py
  5. Wait for the nodes check to complete, and confirm to update the configuration files.
    ###############################################################################
    Started config update script
    Checking nodes list...
    e1n1.fbond, e2n1.fbond, e3n1.fbond, e4n1.fbond, e5n1.fbond, e6n1.fbond
    Checking nodes reachability...
    Checking reachability of e1n1.fbond... ok
    Checking reachability of e2n1.fbond... ok
    Checking reachability of e3n1.fbond... ok
    Checking reachability of e4n1.fbond... ok
    Checking reachability of e5n1.fbond... ok
    Checking reachability of e6n1.fbond... ok
    All the nodes are reachable, proceeding
    Checking system state... Ready
    Platform Manager is currently running, do you want to stop it and update configuration files?
    Continue? y/[n]: y
    Stopping Platform Management...
    Successfully deactivated platform
    Updating /usr/lib/python2.7/site-packages/magneto/cfg/gt_hub.json file on e1n1.fbond
    Updating /usr/lib/python2.7/site-packages/magneto/cfg/gt_hub.json file on e2n1.fbond
    Updating /usr/lib/python2.7/site-packages/magneto/cfg/gt_hub.json file on e3n1.fbond
    Updating /usr/lib/python2.7/site-packages/magneto/cfg/gt_hub.json file on e4n1.fbond
    Updating /usr/lib/python2.7/site-packages/magneto/cfg/gt_hub.json file on e5n1.fbond
    Updating /usr/lib/python2.7/site-packages/magneto/cfg/gt_hub.json file on e6n1.fbond
    Updated, running apstart -p...
    Successfully activated platform
    Script is done, exiting
    ###############################################################################
  6. To verify if the patch was applied successfully, run:
    [node1 update_switches_delay]# cat /usr/lib/python2.7/site-packages/magneto/cfg/gt_hub.json | grep mgtsw
    and confirm the following sections show:
    {"type": "mgtsw", "target": "hw://#bom:mgtsw?vendor=Cumulus#", "timeout": 120, "delay": 5400},