IBM Support

File system usage exceeded threshold

Question & Answer


Question

I am getting an alert email from event manager indicating the following error, what does that means and what are the threshold ? Warning:NPS system xxxxx-node1 - host 1874 Needs attention. System initiated Hardware Type: host Hardware ID:1874 Location:1st host Error:File system usage exceeded threshold Serial#:xxxxxxxxx Event Source:System initiated At: xxxxxxx EDT

Cause

There will do a file system usage check during system startup and routinely check and report with an alert if the usage is above the threshold.

Answer

There are 3 stage of File System Usage check, and here are what the code ('reasonCode') means:

STAGE 1
This comes with reasonCode 1051. It will check all file system (nzstats -type hostfilesystem) and
report when File System Usage exceeded 80%

The 'errString' will be:
"File system usage exceeded threshold, reasonCode=1051"

STAGE 2
This comes with reasonCode 1061. It will check all file system (nzstats -type hostfilesystem) and
report when File System Usage exceeded 90%

The 'errString' will be:
"File system usage exceeded threshold, reasonCode=1061"

STAGE 3
This has no reasonCode given, and the threshold is 95%. If you are hitting this threshold, the system will not be able to start up until the space are clean-up.

The 'errString' will be:
"File system /nz usage exceeded 95 threshold on rack1.host1 System will be stopped "

In short, the thresholds are 80%, 90% and 95% respectively.

What threshold I am hitting ? You might ask...

If you check the /nz/kit/log/eventmgr/eventmgr.log which tell you again the 'reasonCode':
xxxxxxxxxxx EDT Info: received & processing event type = hwNeedsAttention, event args = 'hwId=1874, hwType=host, location=1st host, devSerial=0652588, eventSource=system, errString=File system usage exceeded threshold, reasonCode=1061' event source = 'System initiated event'

In the above case with reasonCode of 1061, means it is hitting the threshold of 90%.



Note, the last stage (stage 3) is checking against /nz directory only ; while the earlier two check all the directories as listed in 'nzstats -type hostfilesystem'.

Here are the sample output of 'nzstats -type hostfilesystem'.
(Check is against the last column : "% Used Space")

$ nzstats -type hostfilesystem

Host FS Device Name Mount Point  Space        Free Space  Used Space   % Used
ID   ID                                                                Space
---- -- ----------- ------------ ------------ ----------- ------------ -------
   1  1 /dev/sda5   /             15414932 KB  3026900 KB  12388032 KB 80.36 %
   1  2 /dev/sda9   /tmp           7703876 KB  3168116 KB   4535760 KB 58.88 %
   1  3 /dev/sda8   /usr           7703876 KB  6656868 KB   1047008 KB 13.59 %
   1  4 /dev/sda11  /usr/local     3851896 KB  1888812 KB   1963084 KB 50.96 %
   1  5 /dev/sda7   /var           7703876 KB  7410220 KB    293656 KB  3.81 %
   1  6 /dev/sda6   /opt           7703876 KB  6627528 KB   1076348 KB 13.97 %
   1  7 /dev/sda3   /export/home  15414964 KB   399840 KB  15015124 KB 97.41 %
   1  8 /dev/sda2   /nz          144479444 KB 30974852 KB 113504592 KB 78.56 %
   1  9 /dev/sda1   /boot           966600 KB   920560 KB     46040 KB  4.76 %
   1 10 none        /dev/shm      79990784 KB 74763516 KB   5227268 KB  6.53 %

The above is configurable although we don't recommend any changes. Here are the default settings:-

$ nzsystem showregistry | grep -i hostFileSystemUsage
sysmgr.hostFileSystemUsageThresholdOneToRiseEvent = 80
sysmgr.hostFileSystemUsageThresholdTwoToRiseEvent = 90
sysmgr.hostFileSystemUsageThresholdToStopSystem = 95

The file system might over the threshold and later go back down to normal.

There are two on-going enhancement requests to improve this :

RTE 53705
File system usage exceeded threshold does not contain suffcient information for the user

RTE 90028
nzOpenPmr File_system_usage_exceeded_threshold - missing info

We would expect the future release to have more detail information like the actual percentages and which file-system name is hitting the limit.

For the time being, the 'nzhealthcheck' report might also provide the past information about which file-system was hitting the threshold.

Internal Use Only

PMR#76885,499,000

[{"Product":{"code":"SSULQD","label":"PureData System for Analytics"},"Business Unit":{"code":"BU001","label":"Analytics Private Cloud"},"Component":null,"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"1.0.0","Edition":""}]

Document Information

Modified date:
16 June 2018

UID

swg21667431