Troubleshooting - TasksMax set too low

The Db2® database running within a systemd slice (typically when started by Db2 Fault Monitor) might fail to start or reject a connection due to pthread_create OSERR: EAGAIN (11) if the TasksMax limit is set too low.

Environment

Linux system using systemd where Db2 runs within a systemd slice.

Diagnosing the problem

Verify the default thread limit for all services on the system:

$ systemctl show --property DefaultTasksMax

Symptom

The db2start command might fail with SQL1225N and a message logged to db2diag.log.

A sample log:

2018-12-05-11.00.34.332127+060 I52280E490            LEVEL: Severe
PID     : 11859                TID : 139906236016384 PROC : db2sysc 0
INSTANCE: db2v111              NODE : 000
HOSTNAME: db2host1
EDUID   : 1                    EDUNAME: db2sysc 0
FUNCTION: DB2 UDB, base sys utilities, sqeAgentServices::CreateIdleAgent, probe:110
RETCODE : ZRC=0xFFFFFB37=-1225
          SQL1225N  The request failed because an operating system process,
          thread, or swap space limit was reached.

CALLED  : OS, -, pthread_create                   OSERR: EAGAIN (11)
DATA #1 : Codepath, 8 bytes
5:10:25:35

Depending on the limits enforced, connections after a certain threshold will be rejected with SQL1225N error returned to the application and another error will be logged by the db2ipccm or the db2tcpcm engine dispatchable units (EDU) to db2diag.log.

A sample log:

2018-12-05-11.00.34.326462+060 E50682E470            LEVEL: Error (OS)
PID     : 11859                TID : 139906236016384 PROC : db2sysc 0
INSTANCE: db2v111              NODE : 000
HOSTNAME: db2host1
EDUID   : 1                    EDUNAME: db2ipccm 0
FUNCTION: DB2 UDB, oper system services, sqloSpawnEDU, probe:80
MESSAGE : ZRC=0x8300000B=-2097151989
         
CALLED  : OS, -, pthread_create                   OSERR: EAGAIN (11)
DATA #1 : Codepath, 8 bytes
5:10:25:35

Cause

The Db2 database has an OS-enforced limit, that a given process or user can create. The RLIMIT_NPROC is overridden automatically for root instances since Db2 10.5.0.5, so typically it is not the root cause for EAGAIN error.

On Linux distributions using systemd (RedHat 7, SUSE 12), an extra layer of resource management called "slices" was introduced, where a thread limit can be enforced by a TasksMax variable. The default of 512 is set for SUSE 12 SP2 and higher and will affect all services running on the system. For more details refer to SUSE Linux Enterprise Server 12 SP2 - Release Notes.

In Db2, when db2start is started manually, it does not run within the systemd slice so it will not be affected by the limit. However, the limit will be enforced for Db2 if it is started by db2fmcd, because all processes spawned by the Db2 Fault Monitor will run within the service's slice. For example:

systemctl status db2fmcd
* db2fmcd.service - DB2 v11.1.3.3
   Loaded: loaded (/etc/systemd/system/db2fmcd.service; enabled; vendor preset: disabled)
   Active: active (running) since Thu 2018-11-08 02:07:56 PST; 3 weeks 6 days ago
 Main PID: 939 (db2fmcd)
   CGroup: /system.slice/db2fmcd.service
           *   939 /opt/ibm/db2/V11.1/bin/db2fmcd
           * 14508 db2wdog 0 [db2v111]
           * 14512 db2sysc 0
           * 14520 db2ckpwd 0
           * 14521 db2ckpwd 0
           * 14522 db2ckpwd 0
           * 14524 db2vend (PD Vendor Process - 1) 0
           * 14536 db2acd 0 ,0,0,0,1,0,0,00000000,0,0,0,0000000000000000,0000000000000000,00000000,00000000,00000000,00000000,00000000,00000000,0000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,000000033408b0...

Resolving the problem

Look to increase the value of DefaultTasksMax in /etc/systemd/system.conf, or TasksMax in the service file (for example db2fmcd.service).