Troubleshooting
Problem
IBM PureData System for Operational Analytics Version 1.0 FP4 and V1.1 ship with an AIX version that has APAR IV66360. This results in unnecessary errors in the AIX errrpt on all hosts for adapters that are not connected.
Symptom
On one or more hosts running 'errpt' will display many messages resembling the following line.
- 76C587C0 0719222915 T H ent2 Physical link down
errpt -a will show messages like the following:
- ---------------------------------------------------------------------------
LABEL: SHIENT_PLINK_DOWN
IDENTIFIER: 76C587C0
Date/Time: Mon Oct 19 20:46:36 IST 2015
Sequence Number: 79424
Machine Id: 00F968BF4C00
Node Id: hostname01
Class: H
Type: TEMP
WPAR: Global
Resource Name: ent7
Resource Class: adapter
Resource Type: e4148a169404
Location: U78C9.001.WZS02F5-P1-C6-T4
VPD:
PCIe2 4-Port (10GbE SFP+ & 1GbE RJ45) Adapter:
FRU Number..................00E2715
EC Level....................D77452
Customer Card ID Number.....2CC3
Part Number.................00E2719
Feature Code/Marketing ID...EN0S
Serial Number...............Y050NY44I617
Manufacture ID..............40F2E9D34CFC
Network Address.............40F2E9D34CFF
ROM Level.(alterable).......30100150
Description
Physical link down
Recommended Actions
PERFORM PROBLEM DETERMINATION PROCEDURES
Detail Data
FILE NAME
line: 442 file: entcore_link.c
MAC ADDRESS
40F2 E9D3 4CFF
DEVICE DRIVER INTERNAL STATE
0000 0000 2000 0000 0000 0000 0000 0001 0000 0000 0000 0000
PCI ETHERNET STATISTICS
0061 0852 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
TRACE RECORD SEQUENCE NUMBER
e:0 l:442 f:entcore_link_change r:0x0 s:0 o:0
NUMBER OF BYTES
SENSE DATA
Diagnostic Analysis
Diagnostic Log sequence number: 236860
Resource tested: ent7
Menu Number: 2E43702
Description:
No trouble was found with this resource. However
Error Log Analysis indicates that there recently may
have been a network problem.
If your Ethernet device is connected to a network,
and if you are experiencing problems with network
communications, check for a loose or defective
cable or connection.
If a switch or another system is directly attached
to the Ethernet device, verify it is powered up,
configured, and functioning correctly.
These messages tend to repeat every 7 minutes for all Available adapters that are not assigned an IP address and are not part of an EtherChannel or are not VLAN adapters.
Cause
The cause is described in the APAR link for AIX found at the following url http://www-01.ibm.com/support/docview.wss?uid=isg1IV66350.
Environment
IBM PureData System for Operational Analytics V1.0 FP4 or earlier, V1.1
Diagnosing The Problem
Look for messages in the errpt in any host in the environment for adapters that are not assigned to ent11 and are listed as Available in lsdev output.
- 76C587C0 0719222915 T H ent2 Physical link down
If there are no messages, you can test the problem using the following one line script.
eclist="$(lsdev | grep ent | grep EtherChannel | awk '{print $1}')";for ec in $eclist;do adapters="$(lsattr -EOl ent11 -a adapter_names | grep -v '#' | sed 's|,| |g')";for adapter in ${adapters};do cmd="entstat ${adapter}";echo $cmd;$cmd;done;done
entstat ent4
entstat: 0909-003 Unable to connect to device ent4, errno = 19
entstat ent0
entstat: 0909-003 Unable to connect to device ent0, errno = 19
entstat ent5
entstat: 0909-003 Unable to connect to device ent5, errno = 19
entstat ent1
entstat: 0909-003 Unable to connect to device ent1, errno = 19
Then check errpt to look for errpt messages for any of the adapters listed in the stderr output.
Resolving The Problem
The following script can be created on all hosts in the environment and either run at startup through an inittab entry, run via cron job, or run by hand. This script implements the workaround mentioned in the APAR bulletin by removing adapters that are not involved in an Etherchannel and are not VLAN adapters. This workaround has been proven to prevent the unecessary message in errpt.
If any of the free adapters are in use update the 'good_adapter_filter' list variable with @ delimited list of adapter names.
- #!/bin/sh
cat<<COMMENT
DATE : 2015-08-27
AUTHOR : GLS
Purpose : Find adapters that trigger this apar, check to see if they are active, if so, put them in a defined state.
COMMENT
export LANG=en_US
good_adapter_list=$(lsattr -EOl ent11 -a adapter_names | grep -v "^#" | sed "s|,| |g")
good_adapter_filter=
for i in ${good_adapter_list}
do
printf "Found adapter $i as part of ent11.\n"
good_adapter_filter="@$i@$good_adapter_filter"
done
printf "Good adapter filter has been created as $good_adapter_filter.\n"
all_adapters=$(lsdev | grep "^ent[0-9]" | egrep -v 'EtherChannel|VLAN' | grep 'Available' | awk '{print $1}')
reccommands=
for acheck in ${all_adapters}
do
printf "Found adapter $acheck.\n"
echo "$good_adapter_filter" | grep "@${acheck}@" > /dev/null
rc=$?
if [ $rc -eq 0 ]
then
printf "The adapter ${acheck} is a valid adapter.\n"
else
printf "The adapter ${acheck} should be in the defined state due to this APAR.\n"
printf "Run the following: rmdev -l ${acheck} to set the adapter to defined.\n"
reccommands="rmdev -l ${acheck}\n${reccommands}"
- printf "Running the command: rmdev -l ${acheck}\n"
rmdev -l ${acheck}
done
printf "Summary:\n"
printf "-------------\n"
printf "$reccommands\n"
printf "-------------\n"
This script can be run at startup or as part of a regular cron job. It can be run more than once.
NOTE:
This script must be re-run after reboot or after running cfgmgr. Both will reset the adapters back to Available which will result in the extraneous messages in errpt.
Related Information
Was this topic helpful?
Document Information
Modified date:
17 October 2019
UID
swg21969287