Topic
4 replies Latest Post - ‏2006-01-20T11:29:08Z by SystemAdmin
SystemAdmin
SystemAdmin
2364 Posts
ACCEPTED ANSWER

Pinned topic child process hangs on futex_wait

‏2006-01-05T13:11:37Z |
hi
One of child process hangs immediately after forking out of parent.
strace of hanged child process shows:

futex(0x8056734, FUTEX_WAIT, 2, NULL

while gdb backtrace of the running process shows:

(gdb) bt

#0 0xb75ebc32 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2

#1 0xb74eb67b in __lll_mutex_lock_wait () from /lib/tls/libpthread.so.0

#2 0x000077e5 in ?? ()

#3 0x08056734 in ?? ()

#4 0x08056734 in ?? ()

#5 0xb74e8787 in _L_mutex_lock_28 () from /lib/tls/libpthread.so.0

#6 0x08056734 in ?? ()

I do not have any means to find out where my child process hangs. Please help me with some technique to debugg the above hang process.
Updated on 2006-01-20T11:29:08Z at 2006-01-20T11:29:08Z by SystemAdmin
  • ishields
    ishields
    988 Posts
    ACCEPTED ANSWER

    Re: child process hangs on futex_wait

    ‏2006-01-06T02:19:41Z  in response to SystemAdmin
    It's a bit hard to debug this with so little information. Have you written processes with children before or is this your first attempt? Which process is uspposed to wait on what? Which process is supposed to post something?

    If you're new to child processes, I'd recommend a good book on programming in this environment. One of the cllassics is UNIX Netowrk Programming by W Richard Stevens which has great examples and is very well written.

    The following code snippet is adapted from one of Stevens' examples and shows the general process for forking a child. In this case, both the parent and the child will continue to completion, each emitting one console message. Stevens also has excellent examples of how to set up communication pipes between the parent and child.

    code#include <iostream>
    #include <unistd.h>
    using namespace std;

    int main(int argc, char** argv){
    pid_t child_pid;

    if( (child_pid = fork()) == -1) {
    cout << "Fork error" << endl;
    exit(1);
    } else if (child_pid == 0 ) {
    cout << "In child " << getpid() << " parent " << getppid() << endl;
    exit(0);
    } else {
    cout << "In parent " << getpid() << " child " << child_pid << endl;
    }
    return 0;
    }[/code]

    Ian Shields
    • SystemAdmin
      SystemAdmin
      2364 Posts
      ACCEPTED ANSWER

      Re: child process hangs on futex_wait

      ‏2006-01-06T05:31:12Z  in response to ishields
      Thanks for replying Ian,
      I am familiar with linux programming. And forked the child exactly
      in the same way as mentioned by you and parent was supposed to wait for the child.
      Just after forking child was supposed to print some traces and exit immediately
      but it hangs somewhere immediately after forking may be a mutex lock. With limited information in hand
      I am finding it hard to go to the root cause. Also the situation does not occur frequently. How can I find more information from the hung process about the locks being hold that process.
  • SystemAdmin
    SystemAdmin
    2364 Posts
    ACCEPTED ANSWER

    Re: child process hangs on futex_wait

    ‏2006-01-20T11:29:08Z  in response to SystemAdmin
    Hi,
    This is a known issue with futex. Futex is called fast user space mutex which was implenment within glibc to make sure that same thread on waking does not acquire the same resource which it just left. For e.g considered threads t1 , t2 and t3 .
    thread t1 has acquired a resource and t2 , t3 are waiting on the same.Once the thread t2 releases the resouces t2 or t3 could get the resouce.But does it mean that thread t2 won't acquire the same resouce again ? Infact nothing can stop it from not acquiring the same resouce.But is this a good behaviour ? Considering this fact in mind futex was introduced to have an locking machanism at user space level. what it inherently does is that it makes the thread t1 to sleep for sometime thus giving oppurtunity for t2 and t3 to acquire the resource.
    However there were some flaws in the implementation with RHEL 3.0 release.But the correction was made in update 2. Hence if you want to do away with this issue then i would recommend you to upgrade the RHEL with update 2 or above.