IBM Support

How to address "VFS: file-max limit 240000 reached" error.

Troubleshooting


Problem

Netezza service failed over to HA2.

Symptom

Netezza service failed over to HA2 and system could not fork any processes in nz user on host HA1.

Cause

Too many processes are running on host HA1.

Diagnosing The Problem

1. I checked /var/log/messages and found following errors at the failover timestamp.
kernel: VFS: file-max limit 240000 reached
lrmd: [22832]: ERROR: perform_ra_op::2875: pipe: Too many open files in system
lrmd: [22832]: ERROR: ra_pipe_op_new::185: fcntl: Bad file descriptor

2. I checked the number of open files as follows:
[root]# sysctl fs.file-nr
fs.file-nr = 258720 0 240000
It says 258720 files has been opened and file-max set is 240000.

3. Check any unusual processes running with ps command.
root 81856 81850 0 07:37 ? 00:00:00 myscript
root 81862 81856 0 07:37 ? 00:00:00 myscript
root 81868 81862 0 07:37 ? 00:00:00 myscript
root 81876 81868 0 07:37 ? 00:00:00 myscript
.................

This means pid 81850 forked pid 81856, and pid 81856 forked pid 81862, and so forth.
This also means myscript run myscript in its inside.

Resolving The Problem

I checked this script and found it calls itself inside, which caused too many processes are running. Eventually this script should be fixed first and reboot HA1 for clear processes termination.

[{"Product":{"code":"SSULQD","label":"IBM PureData System"},"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Component":"Host","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"1.0.0","Edition":"All Editions","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Document Information

Modified date:
17 October 2019

UID

swg21967805