Troubleshooting
Problem
Netezza service failed over to HA2.
Symptom
Netezza service failed over to HA2 and system could not fork any processes in nz user on host HA1.
Cause
Too many processes are running on host HA1.
Diagnosing The Problem
1. I checked /var/log/messages and found following errors at the failover timestamp.
kernel: VFS: file-max limit 240000 reached
lrmd: [22832]: ERROR: perform_ra_op::2875: pipe: Too many open files in system
lrmd: [22832]: ERROR: ra_pipe_op_new::185: fcntl: Bad file descriptor
2. I checked the number of open files as follows:
[root]# sysctl fs.file-nr
fs.file-nr = 258720 0 240000
It says 258720 files has been opened and file-max set is 240000.
3. Check any unusual processes running with ps command.
root 81856 81850 0 07:37 ? 00:00:00 myscript
root 81862 81856 0 07:37 ? 00:00:00 myscript
root 81868 81862 0 07:37 ? 00:00:00 myscript
root 81876 81868 0 07:37 ? 00:00:00 myscript
.................
This means pid 81850 forked pid 81856, and pid 81856 forked pid 81862, and so forth.
This also means myscript run myscript in its inside.
Resolving The Problem
I checked this script and found it calls itself inside, which caused too many processes are running. Eventually this script should be fixed first and reboot HA1 for clear processes termination.
Was this topic helpful?
Document Information
Modified date:
17 October 2019
UID
swg21967805