IBM Support

Effects of Linux kernel semaphore settings on DB2 used by APM v8.1.4

Technical Blog Post


Abstract

Effects of Linux kernel semaphore settings on DB2 used by APM v8.1.4

Body

Working daily on APM issues, it happened lot of times to observe unexpected failures of the
main APM services (MIN, APMUI,Server1) caused by problems on DB2 side.

In such case, looking at the messages.log file for APMUI, you can find something like:

 

com.ibm.tivoli.apm.datamanagement.connector.CURIRequest      E {"msgSeverity":"error","msgId":"ATKRST100E","msgText":"ATKRST100E An unexpected error occured.
The error message is as follows: 'com.ibm.usmi.console.navigator.model.NavExceptionKFWITM633E Exception:
KFWITM217E Request error: PREFETCHDB connection is lost\n: nested exception is: \n\tcom.ibm.tivoli.monitoring.provider.navmodel.ITMRuntimeException: K
FWITM217E Request error: PREFETCHDB connection is lost\n'.","stackTrace":"com.ibm.usmi.console.navigator.model.NavException: (ATKRST100E)
ATKRST100E An unexpected error occured. The error message is as follows: 'com.ibm.usmi.console.navigator.model.NavExceptionKFWITM633E Exception:
KFWITM217E Request error: PREFETCHDB connection is lost\n

 

If DB2 is having problem and the APM services fails to establish a stable connection, you can find something similar also in the other messages.log file,
this is for example an excerpt from server1 log file:


[8/5/18 14:09:24:121 SGT] 000002f3 .ibm.tivoli.monitoring.provider.utility.DBConnectionsMonitor I Prefetch or datamart connection failed. Retry in 30000 milliseconds.
[8/5/18 14:09:24:754 SGT] 00000318 candle.fw.model.SCMMonitor                                   I Timestamp adjusted from:2018-08-05 14:09:24.712 to:2018-08-05 14:09:19.712
[8/5/18 14:09:24:950 SGT] 00203ab2 com.ibm.tivoli.rest.amq                                      E unexpected exception:

[8/7/18 6:04:09:364 SGT] 000002f3 com.ibm.tivoli.monitoring.provider.utility.ITMUtil           E Prefetch DB connection establishment failed.

 

APM 8.1.x creates and uses three databases, all of them hosted in the DB2 server : DATAMART, SCR32 and WAREHOUS.


If you notice the above errors, it means that they are unreachable from APM components and this can lead to further problems and possible crashes of the APM
services because the monitoring events cannot be properly processed.

In this case the root cause is with DB2 itself, likely it experienced a problem or performance temporary degraded leading to this behavior on APM components, that
works as a DB2 client.

In order to understand what happened to db2 you should always refer to db2diag.log, to check for additional diagnostic details recorded at error time.
Another useful diagnostic file is db2apm.nfy files that is also available into /home/db2apm/sqllib/db2dump.

Most of the time, the information from those two files are enough to figure out the DB2 problem that caused the error messages observed in the APM logs.

There is a specific case anyway that I would like to highlight here.

If you notice in MIN messages.log file a sequence like:

 

[8/5/18 6:05:44:274 SGT] 00001d5d com.ibm.tivoli.ccm.prefetch.PrefetchBatchInsert E
com.ibm.db2.jcc.am.BatchUpdateException: [jcc][t4][4.13.127] Batch failure. The batch was submitted, but at least one exception occurred on an individual member
of the batch.
Use getNextException() to retrieve the exceptions for specific batched elements. ERRORCODE=-4229, SQLSTATE=null
at com.ibm.db2.jcc.am.id.a(id.java:405)

 

you have to consider that this error is usually experienced when the db2 fails to cope with big amount of data that must be processed in a batch.

There could be several possible causes for it to occur, but when dealing with APM workload, this has been often associated with possible Linux kernel resources
shortage, in the specific with semaphore resources.

When DB2 is installed, it automatically sets some kernel parameters to have better performances, and semaphores is one of these setting.
It can happen this automatic setting does not take place or also it can happen someone may have changed the original values to something different in the sysctl.conf file.

You can run command:


/sbin/sysctl -a|grep sem

 

to check semaphore values.

 

It returns an output like:

 

kernel.sem = 250        256000  32      4096

 

The minimum requirements are:

 

kernel.sem (SEMMSL)  250
kernel.sem (SEMMNS)  256 000
kernel.sem (SEMOPM)  32
kernel.sem (SEMMNI)  256 * <size of RAM in GB>

 

If the output of the above command returns values that are lower than the minimum requirements and you experienced errors like
ERRORCODE=-4229, SQLSTATE=null for Batch execution in MIN messages.log,  then there is a meaningful chance this is the root cause.

In this case, you need to change semaphore values accordingly, by following steps from:


https://www.ibm.com/support/knowledgecenter/SSEPGG_10.5.0/com.ibm.db2.luw.qb.server.doc/doc/t0008238.html

 

Basically you need to insert a line like:

kernel.sem = 250        256000  32      4096

 

into file /etc/sysctl.conf.


Of course the above is only an example, the values may vary depending on your needs and available RAM.

If the line already exists, you just need to change it with the wanted values.
 
After you changed the file, you need to activate the changes by using command sysctl -p.
As last step, you can perform a restart of APM  (apm restart_all).

This should be enough to prevent reocurrance of the problem between APM services and DB2 server.

Hope it helps.

 

 

 

 

 

Tutorials Point

 

Subscribe and follow us for all the latest information directly on your social feeds:

 

 

image

 

image

 

image

 

 

  

Check out all our other posts and updates:

Academy Blogs:https://goo.gl/U7cYYY
Academy Videos:https://goo.gl/TLfMoF
Academy Google+:https://goo.gl/HnTs0w
Academy Twitter :https://goo.gl/AhR8CL


image

[{"Business Unit":{"code":"BU004","label":"Hybrid Cloud"},"Product":{"code":"","label":""},"Component":"","Platform":[{"code":"","label":""}],"Version":"","Edition":""}]

UID

ibm11085247