IBM Support

Diagnosing Full File Systems in PowerVM VIOS

Troubleshooting


Problem

How to identify and correct a full or near 100% Used file system on a PowerVM Virtual I/O Server (VIOS).

Symptom

The VIOS error log reported a J2_FS_FULL error or a padmin task failed.
Having a full / (root) file system in the Virtual I/O Server rootvg can lead to unpredictable results.  Some symptoms reported in the field include, but are not limited to, padmin commands, Partition Mobility, or padmin login failures.  Example:
$ mkvdev -vdev hdisk100 -vadapter vhost12
Some error messages may contain invalid information
for the Virtual I/O Server environment.
mkdev: 0514-548 Cannot perform the requested function because
    the / filesystem is full or is out of inodes.

Cause

There are common causes known to fill up a file system in a VIOS partition. Such causes vary depending on the impacted file system.
 
IMPORTANT
Unlike AIX, the VIOS is considered an appliance, and consequently, none of the default file systems in rootvg should ever be manually increased via oem_setup_env shell or used as a repository.
 

Environment

This applies to VIOS 3.1 and 4.1

Diagnosing The Problem

This document assumes the VIOS partition is accessible in normal mode and the prime administrator (padmin) user can log in.  
Note 1:
If padmin user fails to log in into normal mode, an outage window may need to be scheduled to boot the VIOS partition into a maintenance mode (root) shell for problem determination.
When a file system is full, a J2_FS_FULL entry is logged in the error log.
To identify which file system triggered the error, log in to the VIOS as padmin user.  Then, use errlog command to examine the error details or run df command to check the file system %Used.
$ errlog -ls|tee errlog.out
$ vi errlog.out                     Search for the error by typing /J2_FS_FULL
LABEL:           J2_FS_FULL
IDENTIFIER:      F7FA22C9


Date/Time:       Tue Apr 10 15:27:43 2012
Sequence Number: 178
Machine Id:      <value>
Node Id:         vio1
Class:           O
Type:            INFO
WPAR:            Global
Resource Name:   SYSJ2
Description
UNABLE TO ALLOCATE SPACE IN FILE SYSTEM

Probable Causes
FILE SYSTEM FULL

Recommended Actions
INCREASE THE SIZE OF THE ASSOCIATED FILE SYSTEM
REMOVE UNNECESSARY DATA FROM FILE SYSTEM
USE FUSER UTILITY TO LOCATE UNLINKED FILES STILL REFERENCED

Detail Data
JFS2 MAJOR/MINOR DEVICE NUMBER
000A 0009
FILE SYSTEM DEVICE AND MOUNT POINT

/dev/hd1, /home

Equivalent commands that can be run when the VIOS partition is booted into maintenance mode with the file systems mounted:
# errpt -a > /home/padmin/errlog.out
# vi /home/padmin/errlog.out           Search for the error by typing /J2_FS_FULL
Use df command with -g (Gigabytes) or -m (Megabytes) to check whether a file system is near or completely full. The following examples shows the file system sizes in megabytes:
$ df -m
Filesystem    MB blocks      Free %Used    Iused %Iused Mounted on
/dev/hd4         256.00    186.97   27%     4370    10% /
/dev/hd2        4032.00   1108.45   73%    62050    19% /usr
/dev/hd9var      576.00    500.86   14%      696     1% /var
/dev/hd3        4800.00   4794.14    1%      100     1% /tmp
/dev/hd1       10240.00   5442.58   47%     1683     1% /home
/dev/hd11admin    128.00    127.64    1%        5     1% /admin
/proc                 -         -    -        -      - /proc
/dev/hd10opt     320.00    218.93   32%      648     2% /opt
/dev/livedump    256.00    255.64    1%        4     1% /var/adm/ras/livedump
/ahafs                -         -    -       43     1% /aha
$
The following example shows the default VIOS file systems size in gigabytes after a VIOS upgrade to 4.1.0.21:

$ df -g
Filesystem    GB blocks      Free %Used    Iused %Iused Mounted on
/dev/hd4           0.25      0.18   27%     4370    10% /
/dev/hd2           3.94      1.08   73%    62050    19% /usr
/dev/hd9var        0.56      0.49   14%      696     1% /var
/dev/hd3           4.69      4.68    1%       99     1% /tmp
/dev/hd1          10.00      5.32   47%     1683     1% /home
/dev/hd11admin      0.12      0.12    1%        5     1% /admin
/proc                 -         -    -        -      - /proc
/dev/hd10opt       0.31      0.21   32%      648     2% /opt
/dev/livedump      0.25      0.25    1%        4     1% /var/adm/ras/livedump
/ahafs                -         -    -       43     1% /aha
$
Note 2:
padmin shell does not support functionality to create or increase a file system on a VIOS partition.   Doing such tasks via oem_setup_env shell is not supported as noted at What is the purpose of oem_setup_env in PowerVM VIOS?

Resolving The Problem

Review Common Causes and Recommendations

/home

By default, the /home file system is intended to be 10 GB in size on a VIOS partition.  This allows /home to be used as a temporary placeholder for VIOS administrative tasks.  For example, sometimes it is necessary to copy data into the VIOS partition.  Such data might include a VIOS fix package needed to update the VIOS or ISO files needed to populate the Virtual Media Repository, also known as Virtual Media Library.
Once the task is completed, those files should be removed.
It is not supported to increase the size of /home. 
If this file system is full or near 100 %Used, determine what is filling it up and remove all files no longer needed.
What to do when /home filesystem is filled up with numerous viod backup logs created in "/home/ios/logs/viod_bkps"
Example:
# du -g /home |sort -n
...
9.24    /home/ios/logs/viod_bkps    <=====
9.96    /home/ios/logs
9.99    /home
9.99    /home/ios
# cd /home/ios/logs/viod_bkps
# ls -la
...
-rw-r--r--    1 root     system      5243002 May 10 00:28 viod_CM.log.2024-05-10-00:28:40
-rw-r--r--    1 root     system      5242893 May 10 00:29 viod_CM.log.2024-05-10-00:29:34
-rw-r--r--    1 root     system      5242946 May 10 00:30 viod_CM.log.2024-05-10-00:30:28
-rw-r--r--    1 root     system      5242890 May 10 00:31 viod_CM.log.2024-05-10-00:31:22
-rw-r--r--    1 root     system      5243042 May 10 00:32 viod_CM.log.2024-05-10-00:32:16
#
# ls -la |wc -l
    1932
This is due to a known issue applicable to VIOS 3.1.4 and 4.1.0.10 (at the time of this writing). 
See respective APAR details for "Local fix"
What to do when /home filesystem keeps getting 100% full by files in /home/ios/logs/ssp_ffdc
First Failure Data Capture (FFDC) in a VIOS is a function which automatically collects data when the VIOS partition is part of a Shared Storage Pool (SSP) cluster setup and an issue is encountered.  Errors similar to the following may be observed in the VIOS error log:
 
Label           Date/Time         Host         C    Type    Resource  Description
CL_FFDC_START   Aug 31 11:06:24   <VIOS_host>  S    TEMP    CL        FFDC initiated.
CL_FFDC_PASS    Aug 31 11:26:27   <VIOS_host>  S    TEMP    CL        FFDC Successful.
The captured data in these temporary files can be useful in analyzing a problem.  FFDC is intended primarily for use by IBM Support. The presence of an FFDC message does not always mean there is a problem.  In cases where normal recovery occurs, no further action is needed. 
To determine the cluster status, run:
$ cluster -status -clustername <mycluster> -verbose
If you are experiencing problems in the VIOS cluster, contact IBM Support for investigation.
The temporary files are normally cleaned up automatically.  However, there are times in which the files are leftover of a failed CL FFDC due to the filesystem being full because the automated cleaning script is not written to take care of failed data collection.  In such cases, the files may need to be manually removed.
If the VIOS cluster is working fine, you can store these files somewhere else, like an NFS server, in the event they are needed later on.  Then, the files can be deleted to free up the file system space.

/usr

/usr file system primary holds padmin CLI. 
This filesystem can be increased in size automatically during a VIOS update process.  If /usr grew in size after a VIOS update, commit all updates to release space in /usr file system by running:
$ updateios -commit
Log files in /usr/local/mpg are filling up /usr
These mpg (Midrange Performance Group) logs are performance related files.
Recommendation
Contact the application vendor for advice.

/var

IMPORTANT
It is best practice to ensure an up-to-date VIOS backup is available before removing any files.
Probable Cause #1
/var file system can grow because it contains subdirectories and data files used by applications.  For example:
     /var/adm      Contains system logging files
     /var/tmp      Contains temporary files

Probable Cause #2
/var/tmp/dpid2.log filling up /var by the second
This log is written to by dpid2 daemon, which is used by Simple Network Management Protocol (SNMP). SNMP is a set of protocols for monitoring systems and devices in complex networks. It provides secure access by a combination of authenticating and encrypting packets over the network.
dpid2 daemon is not enabled on a VIOS by default.

Recommendation

Check if dpid2 daemon is running. If you are using SNMP Version 3, that version already as dpid2 built in. Therefore, dpid2 should not be running, and it can be stopped/disabled.

To display status of dpid2 daemon
$ lssrc -a|grep dpid2
Displays SNMP version
$ ls -l /usr/sbin/snmpd
To disable dpid2:
$ oem_setup_env
# stopsrc -s dpid2

 
Probable Cause #3
/var/adm/cron/log
Ensure you have an up-to-date VIOS backup in the event there is a later need to review the cron log.  Then, clear the log by running:

$ oem_setup_env
# cat /dev/null > /var/adm/cron/log
Probable Cause #4
/var is also used by Partition Mobility.  It is important to examine the file system utilization after performing Partition Mobility operations to verify the file system has adequate space.
Need Further P/D
Determine the type of files filling up the file system, then contact the appropriate application support for recommendation.
If the above common causes are not applicable to your situation, use du and find commands from oem_setup_env shell to determine what is filling up the file system.
Note:
There are files created by an application, such as a core file or application-generated logs that tend to grow over a period of time.
If the file(s) filling up the file system relate to a specific product that may be bundled with the VIOS or that was installed later on, contact that product support for recommendation on how to best handle such files.
The du command can be used with -m (MB unit) or  -g (GB unit) to add up all file sizes per directory:
$ oem_setup_env
# du -m /home |sort -n > /home/padmin/du.out 
# vi /home/padmin/du.out
Then, closely examine the size of each directory to identify the one filled up the file system.
If a specific directory is identified, review the directory contents:
# cd </path_to_directory_that_filled_up_the_filesystem
# ls -la        To list the directory contents
 
find command can be used to look for files larger than 1 MB:
# find /full_filesystem_name -xdev -size +2048 -ls|sort -r +6|pg
This command syntax looks for files greater than 1 MB (2048 512-byte blocks * 512 = 1048576 bytes or 1 MB) in size in the specified file system and sorts them by size from largest to smallest.
Carefully review the list and remove any files that may no longer be needed. 
IMPORTANT
It is best practice to ensure an up-to-date VIOS backup is available before removing any files.
Carefully examine the list to determine what type of files or logs are at the top of the list and the date stamp the file was last touched.  The file path can help ascertain what program created the file, and based on the date the file was last touched we might be able to decide whether the files are relevant or not.  For example, if the file system recently started to get full, but the query returns files that were last touched months or years ago, then they might not be what's now filling up the file system.

To increase the file size search criteria, you can use:
2048 = 1 MB
20480 = 10 MB
204800 = 100 MB

To check for files that have been changed in the past 24 hours, run:
# find /<full_filesystem_name> -ctime 1
 

[{"Type":"MASTER","Line of Business":{"code":"LOB57","label":"Power"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSPHKW","label":"PowerVM Virtual I\/O Server"},"ARM Category":[{"code":"a8mKe000000TN3aIAG","label":"FILESYSTEM"}],"ARM Case Number":"TS004476227","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Versions"}]

Document Information

Modified date:
12 September 2024

UID

isg3T1024441