APAR status
Closed as fixed if next.
Error description
Running a complex query - for example, a long query with lots of UNION statements - may fail with an "Out Of Memory" (OOM) error. The issue is with the size of the section (the compiled statement) and the number of subagents created for the query execution. The parallelism in effect - which determines how many subagents are spawned - is a factor of the DFT_DEGREE setting (SMP parallelism), and the number of subsections (which depends on the query execution plan chosen by the Optimizer). Every subagent holds a copy of the section, which can result in a large memory requirement. You will see error messages in the db2diag.log similar to the following: 2019-08-28-08.43.09.061918-240 I23169E1082 LEVEL: Warning PID : 448991 TID : 140674770921216 PROC : db2sysc 12 INSTANCE: bigsql NODE : 012 DB : BIGSQL APPHDL : 0-14638 APPID: appid AUTHID : bigsql HOSTNAME: bigsqlhost.com EDUID : 27282 EDUNAME: db2agnts (BIGSQL) 12 FUNCTION: DB2 UDB, SQO Memory Management, SqloMemController::requestMemory, probe:50 MESSAGE : ZRC=0x8B0F0000=-1961951232=SQLO_NOMEM "No Memory Available" DIA8300C A memory heap error has occurred. DATA #1 : String, 36 bytes OOM - Instance memory request failed DATA #2 : String, 35 bytes Logging disabled until next success DATA #3 : unsigned integer, 8 bytes 2162688 DATA #4 : unsigned integer, 8 bytes 0 DATA #5 : String, 13 bytes FMP_RESOURCES ... with a stack similar to the following: 0x00007FFFF01D1070 _ZN17SqloMemController9dumpOnOOMEjb + 0x0450 0x00007FFFF02ACB53 sqloMemLogPoolConditions + 0x0243 0x00007FFFF02ABEFD sqloGetMemoryBlockExtended + 0x183d 0x00007FFFEE836CFD _ZN23sqerCommBufferAllocator18allocateCommBufferEP12sqzDataChain I14sqerCommBuffer16sqzChainNodeBaseIS1_EEl + 0x005d 0x00007FFFEE836923 _ZN21sqerFmpCommBufferPool19allocateCommBuffersEm + 0x0083 0x00007FFFEE836852 _ZN21sqerFmpCommBufferPool24adjustCommBufferPoolSizeElRl + 0x0272 0x00007FFFEE8362FF _ZN21sqerFmpCommBufferPool18reserveCommBuffersEllRl + 0x00af 0x00007FFFEE86ECC0 _Z17sqlerConnectToFmpP8sqeAgentP23SQLER_FMP_CONNECT_PARMSPP13sqe rFmpClient + 0x0190 This error triggers an FODC_Memory data collection. In this directory, under the DB2PD directory, db2pd output can be found in a file named <pid>.<eduid>.<nodenumber>.db2pd.inst.alldbs. You can use this output to confirm this issue. In that file, look for output similar to the following (an active or waiting application with a high number of agents): Applications: Address AppHandl [nod-index] NumAgents CoorEDUID Status C-AnchID C-StmtUID L-AnchID L-StmtUID Appid WorkloadID WorkloadOccID CollectActData CollectActPartition CollectSectionActuals 0x0000000206600080 45596 [000-45596] 1014 0 UOW-Waiting 0 0 0 0 xx.xx.xx.xxx.xxxxx.xxxxxxxxxxxx 7 467 n/a n/a n/a Note the high number of agents (NumAgents) - 1014 in this example. Find the active statement for that application handle: Active Statement List: Address AppHandl [nod-index] UOW-ID StmtID AnchID StmtUID EffISO EffLockTOut EffDegree EntryTime StartTime LastRefTime 0x00007FFB9CE4D680 45596 [000-45596] 1 1 462 1 1 120 8 Tue Jul 23 08:58:59 Tue Jul 23 08:58:59 Tue Jul 23 08:59:01 and note the AnchID and StmtUID. Use those values to find the entry in the Dynamic SQL Variations output: Dynamic SQL Variations: Address AnchID StmtUID EnvID VarID NumRef Typ Lockname Val Insert Time Sect Size Num Copies 0x00007FFB3115ED80 462 1 1 1 184 6 01000000010000000100C039D6 Y 2019-07-23-08.58.59.927870 6441752 849 Here you can see the large section size (6,442,752 bytes) and the number of copies of this section that had been made at the time of the memory failure (849); 5 gigabytes of memory. This sudden spike in memory usage triggered the OOM error. You can also use the AnchID and StmtUID to find the statement itself. The solution to prevent the out of memory errors is to disable SMP parallelism for complex queries. The number of subagents created is determined by the number of subsections (the parts of the section that can run in parallel) and the degree of SMP parallelism in effect. Disabling SMP parallelism will reduce the number of subagents that are spawned, thereby reducing the overall memory requirement. Be aware that reducing SMP parallelism can impact the performance of the query. . This fix adds a new DB2_EXTENDED_OPTIMIZATION registry setting option "SMP_MAX_SS". This setting will disable SMP parallelism for plans in which the number of subsections exceeds a specified limit. . For example, if one sets: db2set DB2_EXTENDED_OPTIMIZATION="BI_INFER_CC ON,SMP_MAX_SS 100" . then any optimizer plan with more than 100 subsections will not be SMP-parallelized. The number of subsections to target for throttling can be estimated from the data collected during an Out Of Memory event. . Using the above example, the number of copies of the section at the time of the OOM was 849, as shown in the Dynamic SQL Variations output in the <pid>.<eduid>.<nodenumber>.db2pd.inst.alldbs file, found under the FODC_Memory data collection in the DB2PD directory. . Dynamic SQL Variations: Address AnchID StmtUID EnvID VarID NumRef Typ Lockname Val Insert Time Sect Size Num Copies 0x00007FFB3115ED80 462 1 1 1 184 6 01000000010000000100C039D6 Y 2019-07-23-08.58.59.927870 6441752 849 . A good setting for SMP_MAX_SS is a value lower than the number of section copies at the time of the OOM event divided by the default degree of parallelism that is set in the database configuration. The default is 8, but the actual value can be determined by running db2 get db cfg for bigsql | grep DFT_DEGREE . In the above example 849 divided by the default parallelism of 8 is 106. Setting SMP_MAX_SS to 100 would avoid this OOM.
Local fix
Adding the following guideline to the end of a query will manually throttle parallelism for that query: /*<OPTGUIDELINES><DEGREE VALUE='1'/></OPTGUIDELINES>*/
Problem summary
Please see the problem description.
Problem conclusion
Temporary fix
Comments
APAR Information
APAR number
PH16635
Reported component name
IBM BIG SQL
Reported component ID
5737E7400
Reported release
504
Status
CLOSED FIN
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2019-09-10
Closed date
2020-09-09
Last modified date
2020-12-03
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Applicable component levels
[{"Line of Business":{"code":"LOB10","label":"Data and AI"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSCRJT","label":"IBM Db2 Big SQL"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"504"}]
Document Information
Modified date:
04 December 2020