IBM Support

AIX: TRACK LOAD/UNLOAD OF DYNAMIC LIBRARIES / OBJECTS

Technical Blog Post


Abstract

AIX: TRACK LOAD/UNLOAD OF DYNAMIC LIBRARIES / OBJECTS

Body

Being able to track the list and frequency of dynamic libraries or objects being loaded and unloaded might be handy sometimes.

 

Loading a library is somewhat 'time consuming' because the dynamic loader will have, at one point or another, to resolve symbols used in the newly loaded object. Also care must be taken when unloading a library that nothing else still references that library. Failure to do so would most often lead to a crash with SIGSEGV or SIGILL.

 

It is not uncommon for database engines or other applications to make use of third parties products or UDFs (User Defined Functions). One of the most common ways of doing so is using dynamic libraries or objects. While the end user does not always have full control over the load/unload sequences for the dynamic objects, being able to see what is happening might be useful and help resolve issues.

 

This small article shows how to track the dynamic libraries or objects activity on AIX. While 'truss' might do the job, the way it is designed makes it way too intrusive for a production system. So instead we'll use 'probevue' to obtain the same result but much faster. We will use 2 examples. The first one to find out why a program is running slowly when calling multiple times a function defined in a dynamic library. The second example will help us find out why a program is crashing. Of course those examples here are simple programs but the approach for the case where things would be opaque like using a UDF or third party package would be the same.


== Example 1 ==

  Here we have a simple program calling a function from a dynamic library
  50 times. That function is simply multiplying an input value by 2 and returning
  the result. The execution time shows this:

 

    [/home3/dalla/tmp]->time dlopen1
    [info] func(0) = 0
    [info] func(1) = 2
    [info] func(2) = 4
    [info] func(3) = 6
    ...
    [info] func(47) = 94
    [info] func(48) = 96
    [info] func(49) = 98

    real    0m0.72s
    user    0m0.01s
    sys     0m0.08s

 

  We notice that it is slow and also it seems to consume a lot of system time.
  We know that 'func()' is actually from a dynamic library. So we use the following
  probevue script to see if anything is wrong with the loading of the dynamic
  library. It might be taking too long somehow to load. The script looks like
  this and we run it as root:

 

    /*
     * dlopen1.pb: Track libraries/modules loaded/unloaded by a program.
     *             The program to trace (dlopen1 by default here) can be changed
     *             by modifying the filters 'when (__pname == "dlopen1")'.
     *
     * Run as user 'root' using the following command line:
     *
     *     probevue -t 10 -e 75 -s 64 -o dlopen1.out dlopen1.pb
     *
     *
     * dalla
     */


    unsigned long long kload(char *);
    unsigned long long kunload64(unsigned long long);


    __thread void                  *libname;
    __thread probev_timestamp_t     ts1;


    @@syscallx:*:kload:entry
    when (__pname == "dlopen1")
    {
        thread:libname = __arg1;
        thread:ts1 = timestamp();
        thread:in_kload = 1;
    }

 

    @@syscallx:*:kload:exit
    when (thread:in_kload == 1)
    {
        __auto String               pathname[256];
        __auto probev_timestamp_t   ts2;

 

        ts2 = timestamp();
        pathname = get_userstring(thread:libname, 255);

        printf("[%s - %d] kload(%s) = 0x%016llx [elapsed = %lld ms]\n",
               __pname, __pid, pathname, (unsigned long long) __rv,
               diff_time(thread:ts1, ts2, MICROSECONDS));

        thread:in_kload = 0;
    }

 

    @@syscallx:*:kunload64:entry
    when (__pname == "dlopen1")
    {
        printf("[%s - %d] kunload64(0x%016llx)\n",
               __pname, __pid, (unsigned long long) __arg1);
    }

 

  So we run this as root before starting the 'dlopen1' test program:

 

    # probevue -t 10 -e 75 -s 64 -o dlopen1.out dlopen1.pb

 

  Once the test program has run we interrupt the probevue script and look
  at the output file, here 'dlopen1.out':

 

    [dlopen1 - 47644876] kload(mylib.so) = 0x08fffffff0000758 [elapsed = 38301 ms]
    [dlopen1 - 47644876] kunload64(0x08fffffff0000758)
    [dlopen1 - 47644876] kload(mylib.so) = 0x08fffffff0000808 [elapsed = 17203 ms]
    [dlopen1 - 47644876] kunload64(0x08fffffff0000808)
    [dlopen1 - 47644876] kload(mylib.so) = 0x08fffffff00008b8 [elapsed = 20170 ms]
    [dlopen1 - 47644876] kunload64(0x08fffffff00008b8)
    ...

 

  We see as many occurrences of 'load/unload' sequence than we have calls
  to function 'func()'. So in this case the next step would be to check if there
  is a parameter (or changing the program) so that once loaded we leave the
  library loaded.


== Example 2 ==

  In this case things are a bit different. Our test program still calls 'func()'
  from the dynamic library. It works fine the first 10 times and then something
  causes the program to crash:

 

    [/home3/dalla/tmp]->dlopen2
    calling func() with value 0
    [info] func(0) = 0
    calling func() with value 1
    [info] func(1) = 2
    ...
    calling func() with value 9
    [info] func(9) = 18
    calling func() with value 10
    Illegal instruction(coredump)

 

  So the program received a SIGILL when trying to call 'func()' even though
  it seemed to work fine the first 10 times. This could happen because the
  function pointer to 'func()' has been corrupted, or, because the library
  has been unloaded. In that case as well the small probevue script would help
  confirm if the library has been unloaded or not. The output file would show
  the unload:

 

    [dlopen2 - 1048862] kload(mylib.so) = 0x08fffffff0000758 [elapsed = 1254 ms]
    [dlopen2 - 1048862] kunload64(0x08fffffff0000758)

 

  If you are running AIX 7.1 TL04 or AIX 7.2 or above you can make the script
  a bit more precise as probevue has a 'timestamp()' function as well. So you
  can modify the probe for the 'unload' to print the time at which the 'unload'
  took place:

 

    @@syscallx:*:kunload64:entry
    when (__pname == "dlopen2")
    {
        printf("[%s - %d] kunload64(0x%016llx)[%A]\n",
               __pname, __pid, (unsigned long long) __arg1, timestamp());
    }

 

  The result would then show like this:

 

    [dlopen2 - 30736748] kunload64(0x08fffffff0000750)[Jun/07/17 22:59:25]

 

  This makes it easier to match times.

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSEPGG","label":"Db2 for Linux, UNIX and Windows"},"Component":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

UID

ibm13286341