IY91404: LOAD HANGS ON LINUX/SOLARIS WHEN IPC MESSAGE-QUEUE BYTE LIMIT REACHED

Fixes are available

APAR status

Closed as program error.

Error description

ENV:
Linux / Solaris

During the LOAD initialization phase, transfer buffers are
allocated and pointers to them are put on o/s message queues.
One message is written for every transfer buffer.  At this time
no process is consuming the buffers, so the number of
outstanding queue messages can only grow. Messages are written
in blocking mode.  As a consequence, the load operation will
hang if the maximum number o/s queue messages is reached.

Local fix

To avoid the hang, either apply DB2 UDB Version 9.1 FixPak 2 or
a later FixPak, or tune your operating system message queue
resources.

The latter can be accomplished as follows:

Solaris:
Solaris imposes a configurable limit on the machine-wide maximum
number of outstanding queue messages (msgsys:msginfo_msgtql).
The current setting can be obtained by viewing the contents of
/etc/system. DB2 UDB could potentially also hit the limits
imposed by the maximum number of bytes that can be written to
any single queue (msgsys:msginfo_msgmnb), or the maximum number
of message queues (msgsys:msginfo_msgmni).

The problem scenario can be easily detected by running the
command ipcs -o -q and adding up all the values in the last
column ("QNUM"). If this number is close to the system limit,
you will be likely to encounter the hang problem. The workaround
is to increase the system limit, which might require a reboot.

Linux:
The following kernel parameters are relevant: kernel.msgmnb
(maximum number of bytes per queue) and kernel.msgmni (maximum
number of queues). Current values can be determined using the
command /sbin/sysctl -a | grep kernel | grep msg and values can
be changed using the command /sbin/sysctl -w <parameter
name>=<new value>.

The problem scenario can be easily detected by issuing the
command ipcs -q and looking at the values in the last two
columns (used-bytes and messages). If the numbers are close to
any of the system limits, you will be likely to encounter the
hang problem. The workaround is to increase the system limit,
which might require a reboot.

Picking a reasonable value for the maximum number of messages is
tricky, as it depends on the system-wide number of concurrent
load operations, the number of database partitions used by all
instances defined on the machine, as well as on the properties
of the LOAD target tables.  A rough estimate of the message
requirements for a single load in a multiple partition
configuration is as follows:

N_message = N_partition * DB / 16
where
* "N_message" is an estimate of the number of messages the load
utility will put on the queues during initialization
* "N_partition" is the number of database partitions defined on
the machine in question
* "DB" is the value of the DATA BUFFER option (as specified in
the LOAD command). If you did not explicitly specify that
option, use 25% of the UTIL_HEAP_SZ database configuration
parameter.

This simplified calculation involves two assumptions, both of
which likely result in an overestimation of the number of
messages:
1. That all memory given to the load utility is indeed used for
transfer buffers, which is certainly not the case where
multidimensional clustering (MDC) tables are concerned.
2. That all buffers are 64K in size, which is an underestimate
as long as the target table space is not a database-managed
space (DMS) table space with an extent size smaller than 64K.

Problem summary

****************************************************************
USERS AFFECTED:
Any users performing a LOAD with db2 version 9.1 GA or 9.1
Fixpack 1.
****************************************************************
PROBLEM DESCRIPTION:
During the LOAD initialization phase, transfer buffers are
allocated and pointers to them are put on o/s message queues.
One message is written for every transfer buffer.  At this time
no process is consuming the buffers, so the number of
outstanding queue messages can only grow. Messages are written
in blocking mode.  As a consequence, the load operation will
hang if the maximum number o/s queue messages is reached.

****************************************************************
RECOMMENDATION:
Upgrade to DB2 Version 9.1 fixpack 2 or higher.

****************************************************************

Problem conclusion

First fixed in DB2 LUW version 9.1 Fixpack 2

Temporary fix

Comments

APAR Information

APAR number
IY91404
Reported component name
DB2 UDB ESE SOL
Reported component ID
5765F4102
Reported release
910
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt
Submitted date
2006-11-06
Closed date
2009-06-12
Last modified date
2009-06-12

APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:

Fix information

Fixed component name
DB2 UDB ESE SOL
Fixed component ID
5765F4102

Applicable component levels

R910 PSY
UP
R810 PSN
UP
R820 PSN
UP
R950 PSN
UP

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSEPGG","label":"DB2 for Linux- UNIX and Windows"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"910","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Document Information

Modified date:
07 January 2022

Tips

IY91404: LOAD HANGS ON LINUX/SOLARIS WHEN IPC MESSAGE-QUEUE BYTE LIMIT REACHED

Fixes are available

Subscribe

APAR status

Closed as program error.

Error description

Local fix

Problem summary

Problem conclusion

Temporary fix

Comments

APAR Information

APAR number

Reported component name

Reported component ID

Reported release

Status

PE

HIPER

Special Attention

Submitted date

Closed date

Last modified date

APAR is sysrouted FROM one or more of the following:

APAR is sysrouted TO one or more of the following:

Fix information

Fixed component name

Fixed component ID

Applicable component levels

R910 PSY

R810 PSN

R820 PSN

R950 PSN

Document Information

Share your feedback

Need support?