IBM Support

SOLARIS: INCORRECT DB2 STACK TRACE WHEN STATIC FUNCTION

Technical Blog Post


Abstract

SOLARIS: INCORRECT DB2 STACK TRACE WHEN STATIC FUNCTION

Body

Sometimes db2 stack traces on Solaris can be misleading as the function
name we see on the stack is not the name of the function that was called.
This might happen when a 'static' function is called. It has to do with
the fact that 'static' functions do not have 'global' entries in the
symbol table.

For example, if you have a source file containing functions like this:

  int function1()
  {
      ...
  }
  static int function2()
  {
      ...
  }
  int function3()
  {
      ...
  }

If some function calls 'function2()' then in the db2 stack trace you
will see 'function1()'.

So let's take a quick example with a stack trace from a 'trap' file that
was generated on Solaris. The 'return' address are not printed in the
stack trace but if we add them (from the rawstack) we would get this:

  ...
  0xFFFFFFFF73D6E238 SsqldLongDataLength() + 0x278
  0xFFFFFFFF73D5E060 sqldBuildCSO() + 0x1f0
  0xFFFFFFFF73D5E3CC sqldBuildAndLockCSO() + 0x84
  0xFFFFFFFF73D6019C sqldGetXMLDocument() + 0x1adc 
  0xFFFFFFFF73D686B4 sqldRowFetch() + 0x1b4c
  0xFFFFFFFF76BABB78 sqlriFetch() + 0x1e8
  0xFFFFFFFF76D035A4 sqlrita() + 0x684
  0xFFFFFFFF76D68250 sqlrihsjn() + 0xb78
  ...

The function that does not fit here is 'sqldGetXMLDocument()'. That function
is never called from 'sqldRowFetch()' and doesn't call 'sqldBuildAndLockCSO()'.

So, let's see what happened... Using the return address in 'sqldGetXMLDocument()'
we locate what library it belongs to and at what offset it is in that library.

  FFFFFFFF70400000     172032K r-x--  /vbs/engn/lib/libdb2e.so.1

  Offset in library = 0xFFFFFFFF73D6019C - 0xFFFFFFFF70400000 = 0x396019C

Now we check the symbol table for the library:

  nm -tx -v ~/sqllib/lib64/libdb2e.so.1 > libdb2e.so.1.nm

We walk the global symbol table and find that our offset fits right between those
2 entries. Therefore we collect the name 'sqldGetXMLDocument()':

  [63584] |0x000000000395e6c0|0x0000000000000294|FUNC |GLOB |0x3  |11 |sqldGetXMLDocument()
  [5641]  |0x000000000395e970|0x000000000000323c|FUNC |LOCL |0x3  |11 |sqldReturnData()
  [5642]  |0x0000000003961bd8|0x0000000000001410|FUNC |LOCL |0x3  |11 |sqldDirectFetch()
  [5643]  |0x0000000003963000|0x0000000000000580|FUNC |LOCL |0x3  |11 |sqldSamplingFetch()
  [5644]  |0x0000000003963598|0x000000000000050c|FUNC |LOCL |0x3  |11 |sqldFromListFetch()
  [5645]  |0x0000000003963ac0|0x0000000000000d40|FUNC |LOCL |0x3  |11 |sqldRIDlistFetch()
  [63799] |0x0000000003964818|0x0000000000001310|FUNC |GLOB |0x3  |11 |sqldDataFetch()

This is because we only consider global symbols and our address fits
between 0x000000000395e6c0 and 0x0000000003964818 so we wrongly assume
it is part of 'sqldGetXMLDocument()'.

One thing that can be done to find out the real name is look at the assembly
instruction at the return address in 'sqldGetXMLDocument() using a debugger:

  (dbx) examine 0xFFFFFFFF73D686B4/i
  0xffffffff73d686b4: sqldRowFetch+0x1b4c:  call sqldReturnData ! 0xffffffff73d5e970

Here we clearly see the call to 'sqldReturnData()' which is the correct one.

[{"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Product":{"code":"SSEPGG","label":"DB2 for Linux, UNIX and Windows"},"Component":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

UID

ibm13286077