Technical Blog Post
Abstract
SOLARIS: INCORRECT DB2 STACK TRACE WHEN STATIC FUNCTION
Body
Sometimes db2 stack traces on Solaris can be misleading as the function
name we see on the stack is not the name of the function that was called.
This might happen when a 'static' function is called. It has to do with
the fact that 'static' functions do not have 'global' entries in the
symbol table.
For example, if you have a source file containing functions like this:
int function1()
{
...
}
static int function2()
{
...
}
int function3()
{
...
}
If some function calls 'function2()' then in the db2 stack trace you
will see 'function1()'.
So let's take a quick example with a stack trace from a 'trap' file that
was generated on Solaris. The 'return' address are not printed in the
stack trace but if we add them (from the rawstack) we would get this:
...
0xFFFFFFFF73D6E238 SsqldLongDataLength() + 0x278
0xFFFFFFFF73D5E060 sqldBuildCSO() + 0x1f0
0xFFFFFFFF73D5E3CC sqldBuildAndLockCSO() + 0x84
0xFFFFFFFF73D6019C sqldGetXMLDocument() + 0x1adc
0xFFFFFFFF73D686B4 sqldRowFetch() + 0x1b4c
0xFFFFFFFF76BABB78 sqlriFetch() + 0x1e8
0xFFFFFFFF76D035A4 sqlrita() + 0x684
0xFFFFFFFF76D68250 sqlrihsjn() + 0xb78
...
The function that does not fit here is 'sqldGetXMLDocument()'. That function
is never called from 'sqldRowFetch()' and doesn't call 'sqldBuildAndLockCSO()'.
So, let's see what happened... Using the return address in 'sqldGetXMLDocument()'
we locate what library it belongs to and at what offset it is in that library.
FFFFFFFF70400000 172032K r-x-- /vbs/engn/lib/libdb2e.so.1
Offset in library = 0xFFFFFFFF73D6019C - 0xFFFFFFFF70400000 = 0x396019C
Now we check the symbol table for the library:
nm -tx -v ~/sqllib/lib64/libdb2e.so.1 > libdb2e.so.1.nm
We walk the global symbol table and find that our offset fits right between those
2 entries. Therefore we collect the name 'sqldGetXMLDocument()':
[63584] |0x000000000395e6c0|0x0000000000000294|FUNC |GLOB |0x3 |11 |sqldGetXMLDocument()
[5641] |0x000000000395e970|0x000000000000323c|FUNC |LOCL |0x3 |11 |sqldReturnData()
[5642] |0x0000000003961bd8|0x0000000000001410|FUNC |LOCL |0x3 |11 |sqldDirectFetch()
[5643] |0x0000000003963000|0x0000000000000580|FUNC |LOCL |0x3 |11 |sqldSamplingFetch()
[5644] |0x0000000003963598|0x000000000000050c|FUNC |LOCL |0x3 |11 |sqldFromListFetch()
[5645] |0x0000000003963ac0|0x0000000000000d40|FUNC |LOCL |0x3 |11 |sqldRIDlistFetch()
[63799] |0x0000000003964818|0x0000000000001310|FUNC |GLOB |0x3 |11 |sqldDataFetch()
This is because we only consider global symbols and our address fits
between 0x000000000395e6c0 and 0x0000000003964818 so we wrongly assume
it is part of 'sqldGetXMLDocument()'.
One thing that can be done to find out the real name is look at the assembly
instruction at the return address in 'sqldGetXMLDocument() using a debugger:
(dbx) examine 0xFFFFFFFF73D686B4/i
0xffffffff73d686b4: sqldRowFetch+0x1b4c: call sqldReturnData ! 0xffffffff73d5e970
Here we clearly see the call to 'sqldReturnData()' which is the correct one.
[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSEPGG","label":"Db2 for Linux, UNIX and Windows"},"Component":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]
UID
ibm13286077