After the patch is applied, platform manager reduces status polling interval for 1 GbE
management switches. The high rate of health checks was causing eUSB flash storage failures in both
management switch and Fibre Channel switch. The new check interval is 90 min. It is highly
recommended to apply this patch to avoid switch failures.
Before you begin
- The patch is applicable to any Integrated Analytics System version lower
than 1.0.27.0
- The patch must be re-applied after any upgrade lower than 1.0.27.0.
- The patch is executed by running an interactive update_switches_delay.py
script that guides you through the process.
- The estimated run time is 5-10 minutes, depending on the system size. Platform Manager is
stopped, but the applications remain online. Database remains online.
- Before you run the script,
cat
the appropriate json
file
before running the script so you can compare with the results. Note: Depending on the
Integrated Analytics System version, the file to check may be in different locations:
/usr/lib/python2.7/site-packages/magneto/cfg/sf_rack_leader.json
/usr/lib/python2.7/site-packages/magneto/cfg/3452_rack_leader.json
/usr/lib/python2.7/site-packages/magneto/cfg/3452_hub.json
Run the script as root
. Either log in as root
directly, or
use the command su -
. The su root
command does not work and causes
the process to fail.
Procedure
- Download the 1.0.0.0.switch_monitoring_policy-IM-IIAS-fpXXX package,
where XXX stands for the latest package number, from Fix
Central.
- After the tar file is downloaded, untar the
switch_monitoring_policy-1.0_release.tar.gz by using tar
-xvf command.
update_switches_delay
directory is created. Example:
tar -xvf switch_monitoring_policy-1.0_release.tar.gz
update_switches_delay/
update_switches_delay/update_switches_delay.py
- Run:
- Run the following command without any parameters.
python update_switches_delay.py
- Wait for the nodes check to complete, and confirm to update the configuration
files:
[root@e1-n1 update_switches_delay]# ./update_switches_delay.py
###############################################################################
Started config update script
Checking nodes list...
node0101-fab, node0102-fab, node0103-fab, node0104-fab, node0105-fab, node0106-fab, node0107-fab
Checking nodes reachability...
Checking reachability of node0101-fab... ok
Checking reachability of node0102-fab... ok
Checking reachability of node0103-fab... ok
Checking reachability of node0104-fab... ok
Checking reachability of node0105-fab... ok
Checking reachability of node0106-fab... ok
Checking reachability of node0107-fab... ok
All the nodes are reachable, proceeding
Checking system state... Ready
Platform Manager is currently running, do you want to stop it and update configuration files?
Continue? y/[n]: y
Stopping Platform Management...
Successfully deactivated platform
Updating /usr/lib/python2.7/site-packages/magneto/cfg/3452_rack_leader.json file on node0101-fab
Updating /usr/lib/python2.7/site-packages/magneto/cfg/3452_rack_leader.json file on node0102-fab
Updating /usr/lib/python2.7/site-packages/magneto/cfg/3452_rack_leader.json file on node0103-fab
Updating /usr/lib/python2.7/site-packages/magneto/cfg/3452_rack_leader.json file on node0104-fab
Updating /usr/lib/python2.7/site-packages/magneto/cfg/3452_rack_leader.json file on node0105-fab
Updating /usr/lib/python2.7/site-packages/magneto/cfg/3452_rack_leader.json file on node0106-fab
Updating /usr/lib/python2.7/site-packages/magneto/cfg/3452_rack_leader.json file on node0107-fab
Updated, running apstart -p...
Successfully activated platform
Script is done, exiting
###############################################################################
- For verification:
cat
the appropriate json
file and compare with the previous
resultsNote: Depending on the
Integrated Analytics System version, the file
to check may be in different locations:
/usr/lib/python2.7/site-packages/magneto/cfg/sf_rack_leader.json
/usr/lib/python2.7/site-packages/magneto/cfg/3452_rack_leader.json
/usr/lib/python2.7/site-packages/magneto/cfg/3452_hub.json
The following sections should be changed:
{"type": "mgtsw", "target": "hw://#bom:mgtsw/hadomain#", "timeout": 120, "delay": 5400},
{"type": "fcsw", "target": "hw://#bom:fcsw/hadomain#", "timeout": 120, "delay": 5400},
- Look at the text files for
mgtsw
and fcsw
located in
/var/log/appliance/platform/management/resmgr_out/
. After the patch has been
applied, the check is run approximately every 90 minutes:
mgtsw
2022-03-10 12:09:19|24.76s|STATUS|mgtsw@hw://hadomain1.mgtswa
2022-03-10 13:40:23|24.36s|STATUS|mgtsw@hw://hadomain1.mgtswa
2022-03-10 15:03:04|24.07s|STATUS|mgtsw@hw://hadomain1.mgtswa
2022-03-10 12:09:20|25.54s|STATUS|mgtsw@hw://hadomain1.mgtswb
2022-03-10 13:46:39|23.85s|STATUS|mgtsw@hw://hadomain1.mgtswb
2022-03-10 15:19:16|23.72s|STATUS|mgtsw@hw://hadomain1.mgtswb
fcsw
:2022-03-10 12:09:30|35.91s|STATUS|fcsw@hw://hadomain1.fcswa
2022-03-10 13:44:44|34.61s|STATUS|fcsw@hw://hadomain1.fcswa
2022-03-10 15:17:12|34.98s|STATUS|fcsw@hw://hadomain1.fcswa
2022-03-10 12:09:30|35.56s|STATUS|fcsw@hw://hadomain1.fcswb
2022-03-10 13:35:09|33.79s|STATUS|fcsw@hw://hadomain1.fcswb
2022-03-10 15:11:41|33.75s|STATUS|fcsw@hw://hadomain1.fcswb