Topic
  • 24 replies
  • Latest Post - ‏2013-05-13T12:32:44Z by rdumeur
SystemAdmin
SystemAdmin
554 Posts

Pinned topic Mutex Exception in multiple threads execution

‏2013-03-22T09:09:43Z |
Hi all,

I'm using CPLEX and CP (12.5) on Windows x64.
Running a test in Eclipse that simulates multiple execution threads I get the following exception:

ilog.concert.IloException: Not owner of the mutex
at ilog.cp.cppimpl.cp_wrapJNI.new_IloCP__SWIG_1(Native Method)
at ilog.cp.cppimpl.IloCP.<init>(IloCP.java:109)
at ilog.cp.IloCP.<init>(IloCP.java:120)
at ilog.cp.IloCP.<init>(IloCP.java:112)

The application is supposed to run in a sever environment so I would like to know if is it necessary to synchronize the access to the libraries and, in that case, how is it supposed to be done.

Thank you very much in advance.
Regards.
Updated on 2013-03-29T21:17:15Z at 2013-03-29T21:17:15Z by SystemAdmin
  • rdumeur
    rdumeur
    109 Posts

    Re: Mutex Exception in multiple threads execution

    ‏2013-03-22T09:56:15Z  
    Dear Jorge,

    Is it correct to assume that you are using multiple threads, each of them invoking methods on their own IloCP solver? If so, I would advise to create a dummy IloCP before any thread is launched, and destroy it immediately:
    
    
    // this performs required initializations, if multiple threads using IloCP instances are used. 
    { IloCP cp = 
    
    new IloCP(); cp.end(); 
    }
    

    I hope this helps.

    Cheers,
  • SystemAdmin
    SystemAdmin
    554 Posts

    Re: Mutex Exception in multiple threads execution

    ‏2013-03-22T12:38:52Z  
    • rdumeur
    • ‏2013-03-22T09:56:15Z
    Dear Jorge,

    Is it correct to assume that you are using multiple threads, each of them invoking methods on their own IloCP solver? If so, I would advise to create a dummy IloCP before any thread is launched, and destroy it immediately:
    <pre class="jive-pre"> // this performs required initializations, if multiple threads using IloCP instances are used. { IloCP cp = new IloCP(); cp.end(); } </pre>
    I hope this helps.

    Cheers,
    Thanks for you answer,

    It is correct that each thread has their own instance of IloCP.
    I followed your suggestion and I am not getting the IloException anymore, but the execution still falls into deadlock when two or more threads call "IloCP cp = new IloCP()".

    When I force some delay between threads, execution ends normally but when I don't, they don't continue after the call to new IloCP().

    Regards.
  • rdumeur
    rdumeur
    109 Posts

    Re: Mutex Exception in multiple threads execution

    ‏2013-03-22T15:28:50Z  
    Dear Jorge,

    Could you please set the "Workers" param of each IloCP that is ran by each thread to 1?
    This will ensure that IloCP won't fork any other native thread.
    I hope it helps.

    Cheers,
  • SystemAdmin
    SystemAdmin
    554 Posts

    Re: Mutex Exception in multiple threads execution

    ‏2013-03-25T09:21:08Z  
    • rdumeur
    • ‏2013-03-22T15:28:50Z
    Dear Jorge,

    Could you please set the "Workers" param of each IloCP that is ran by each thread to 1?
    This will ensure that IloCP won't fork any other native thread.
    I hope it helps.

    Cheers,
    Thanks for the answer.

    I get the deadlock in the call to the constructor new IloCP() (I'm using Java API from Concert Tech).
    It does not solve the problem setting the parameter right after creating the instance as it doesn't get there.

    Regards.
  • rdumeur
    rdumeur
    109 Posts

    Re: Mutex Exception in multiple threads execution

    ‏2013-03-25T10:15:17Z  
    Thanks for the answer.

    I get the deadlock in the call to the constructor new IloCP() (I'm using Java API from Concert Tech).
    It does not solve the problem setting the parameter right after creating the instance as it doesn't get there.

    Regards.
    Dear Jorge,

    I'd like to replicate your problem. Could you please tell me the JVM that you are using?
    Also, could you please try 12.5.0.1 which I think is now available?
    Cheers,
  • rdumeur
    rdumeur
    109 Posts

    Re: Mutex Exception in multiple threads execution

    ‏2013-03-25T10:33:36Z  
    • rdumeur
    • ‏2013-03-25T10:15:17Z
    Dear Jorge,

    I'd like to replicate your problem. Could you please tell me the JVM that you are using?
    Also, could you please try 12.5.0.1 which I think is now available?
    Cheers,
    Dear Jorge,

    I have a few more questions:
    • What are the machine processors?
    • How many cores are available?
    • How many java threads are created that perform IloCP cp = new IloCP();
    Thank you in advance,
    Cheers,
  • rdumeur
    rdumeur
    109 Posts

    Re: Mutex Exception in multiple threads execution

    ‏2013-03-25T14:38:39Z  
    • rdumeur
    • ‏2013-03-25T10:33:36Z
    Dear Jorge,

    I have a few more questions:
    • What are the machine processors?
    • How many cores are available?
    • How many java threads are created that perform IloCP cp = new IloCP();
    Thank you in advance,
    Cheers,
    Dear Jorge,

    Meanwhile, I've tried to reproduce your problem using windows 64.
    The attached program creates one IloCP object then destroys it immediately. It then start 4 threads which each loop to create/destroy 10000 IloCP objects. It doesn't hang with 12.5.
    Could you please compile it and run it on your machine to see what happens?

    Cheers,
  • SystemAdmin
    SystemAdmin
    554 Posts

    Re: Mutex Exception in multiple threads execution

    ‏2013-03-25T17:45:56Z  
    • rdumeur
    • ‏2013-03-25T14:38:39Z
    Dear Jorge,

    Meanwhile, I've tried to reproduce your problem using windows 64.
    The attached program creates one IloCP object then destroys it immediately. It then start 4 threads which each loop to create/destroy 10000 IloCP objects. It doesn't hang with 12.5.
    Could you please compile it and run it on your machine to see what happens?

    Cheers,
    Hello,

    These are the JVM settings i have:

    java.version = 1.7.0_15
    java.vm.info = mixed mode
    java.vm.name = Java HotSpot(TM) 64-Bit Server VM
    java.vm.specification.vendor = Oracle Corporation
    java.vm.specification.version = 1.7
    java.vm.vendor = Oracle Corporation
    java.vm.version = 23.7-b01

    I have an Intel Core i5 CPU M520 @ 2.4 Ghz on 64 bits computer with Windows 7.
    There are 3 Java Threads being createad and having their own IloCP instance.

    I'm sorry I made a mistake, we're not using CPLEX 12.5 but 12.4, and I'm afraid I can't upgrade to 12.5.0.1 as the project is in deployment phase and we cannot make that change.

    Thanks for all.
  • rdumeur
    rdumeur
    109 Posts

    Re: Mutex Exception in multiple threads execution

    ‏2013-03-25T20:16:39Z  
    Hello,

    These are the JVM settings i have:

    java.version = 1.7.0_15
    java.vm.info = mixed mode
    java.vm.name = Java HotSpot(TM) 64-Bit Server VM
    java.vm.specification.vendor = Oracle Corporation
    java.vm.specification.version = 1.7
    java.vm.vendor = Oracle Corporation
    java.vm.version = 23.7-b01

    I have an Intel Core i5 CPU M520 @ 2.4 Ghz on 64 bits computer with Windows 7.
    There are 3 Java Threads being createad and having their own IloCP instance.

    I'm sorry I made a mistake, we're not using CPLEX 12.5 but 12.4, and I'm afraid I can't upgrade to 12.5.0.1 as the project is in deployment phase and we cannot make that change.

    Thanks for all.
    Thank you Jorge,

    For your informatin, I've tried the test program given in my previous mail with the CPLEX Studio 12.4 install and it doesn't hang at all with either an IBM 6.0 or 7.0 JVM.
    Note that with the first run of the (freshly installed) 7.0 JVM seemed to hang during a few seconds (I guess it JIT compiled freshly installed classes). Further runs ran at normal speed. My current test machine is only a dual core machine, so I am very interested to know what happens when running the test program given in my previous mail on your 4 core machine.
    Note that as stated in http://www-01.ibm.com/software/integration/optimization/cplex/reqs, the 7.0 JDK is not yet supported by CPLEX, when the 6.0 is, so maybe it is worth trying it?

    Cheers,
  • SystemAdmin
    SystemAdmin
    554 Posts

    Re: Mutex Exception in multiple threads execution

    ‏2013-03-29T21:17:15Z  
    I believe that I am encountering the same issue as Jorge.

    Today I started running my unit tests in parallel with a setting of 4 concurrent test(my test suite takes about 15 minutes otherwise). About 1 out of 3 runs of my test suite ends up stuck. There is no action on the CPU, with a profiler I can see the same stack trace that Jorge posted originally.

    My environment is:
    Windows 7 64 bit
    4 Core Machine with Hyperthreading (looks like 8 cores in Windows)
    Java HotSpot(TM) 64-Bit Server VM; 1.6.0_24; 19.1-b02; mixed mode
    CPLEX 12.4

    My configuration parameter is set to only give 1 Worker thread per solve.

    We also intend to run this in a multi-threaded environment.

    I can test more on Monday with 12.5. We intend to upgrade to 12.5 soon and I will report back with any finding.

    Steve
  • JorgeGarcíaCastillo
    6 Posts

    Re: Mutex Exception in multiple threads execution

    ‏2013-04-25T11:18:16Z  
    • rdumeur
    • ‏2013-03-25T20:16:39Z
    Thank you Jorge,

    For your informatin, I've tried the test program given in my previous mail with the CPLEX Studio 12.4 install and it doesn't hang at all with either an IBM 6.0 or 7.0 JVM.
    Note that with the first run of the (freshly installed) 7.0 JVM seemed to hang during a few seconds (I guess it JIT compiled freshly installed classes). Further runs ran at normal speed. My current test machine is only a dual core machine, so I am very interested to know what happens when running the test program given in my previous mail on your 4 core machine.
    Note that as stated in http://www-01.ibm.com/software/integration/optimization/cplex/reqs, the 7.0 JDK is not yet supported by CPLEX, when the 6.0 is, so maybe it is worth trying it?

    Cheers,

    Hi,

    Sorry for the late reply. I tried your code and it is true that it doesn't hang. But mine still does, the only different between your code and mine is that runs under JUnit.

    Making furher profiling with jvisualvm I found the following thread stack (where the code hungs):

     

     

    --------------------------------------- START OF THREAD DUMP ---------------------------------------------------------

     

    Full thread dump Java HotSpot(TM) 64-Bit Server VM (20.14-b01 mixed mode):

     
    "RMI TCP Connection(2)-9.172.149.96" daemon prio=6 tid=0x0000000008281000 nid=0x2354 runnable [0x000000000bdbf000]
       java.lang.Thread.State: RUNNABLE
    at java.net.SocketInputStream.socketRead0(Native Method)
    at java.net.SocketInputStream.read(Unknown Source)
    at java.io.BufferedInputStream.fill(Unknown Source)
    at java.io.BufferedInputStream.read(Unknown Source)
    - locked <0x00000007d6913600> (a java.io.BufferedInputStream)
    at java.io.FilterInputStream.read(Unknown Source)
    at sun.rmi.transport.tcp.TCPTransport.handleMessages(Unknown Source)
    at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(Unknown Source)
    at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.lang.Thread.run(Unknown Source)
     
       Locked ownable synchronizers:
    - <0x00000007d66b2ec8> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
     
    "JMX server connection timeout 29" daemon prio=6 tid=0x0000000008280000 nid=0x1e3c in Object.wait() [0x000000000bcbf000]
       java.lang.Thread.State: TIMED_WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    - waiting on <0x00000007d67979a0> (a [I)
    at com.sun.jmx.remote.internal.ServerCommunicatorAdmin$Timeout.run(Unknown Source)
    - locked <0x00000007d67979a0> (a [I)
    at java.lang.Thread.run(Unknown Source)
     
       Locked ownable synchronizers:
    - None
     
    "RMI Scheduler(0)" daemon prio=6 tid=0x000000000827f800 nid=0x2244 waiting on condition [0x000000000bbbf000]
       java.lang.Thread.State: TIMED_WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for  <0x00000007d6693a88> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    at java.util.concurrent.locks.LockSupport.parkNanos(Unknown Source)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(Unknown Source)
    at java.util.concurrent.DelayQueue.take(Unknown Source)
    at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(Unknown Source)
    at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor.getTask(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.lang.Thread.run(Unknown Source)
     
       Locked ownable synchronizers:
    - None
     
    "RMI TCP Connection(1)-9.172.149.96" daemon prio=6 tid=0x000000000827e800 nid=0xd44 runnable [0x000000000babf000]
       java.lang.Thread.State: RUNNABLE
    at java.net.SocketInputStream.socketRead0(Native Method)
    at java.net.SocketInputStream.read(Unknown Source)
    at java.io.BufferedInputStream.fill(Unknown Source)
    at java.io.BufferedInputStream.read(Unknown Source)
    - locked <0x00000007d671c530> (a java.io.BufferedInputStream)
    at java.io.FilterInputStream.read(Unknown Source)
    at sun.rmi.transport.tcp.TCPTransport.handleMessages(Unknown Source)
    at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(Unknown Source)
    at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.lang.Thread.run(Unknown Source)
     
       Locked ownable synchronizers:
    - <0x00000007d66ae7f8> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
     
    "RMI TCP Accept-0" daemon prio=6 tid=0x000000000827e000 nid=0x20dc runnable [0x000000000b9bf000]
       java.lang.Thread.State: RUNNABLE
    at java.net.PlainSocketImpl.socketAccept(Native Method)
    at java.net.PlainSocketImpl.accept(Unknown Source)
    - locked <0x00000007d669a060> (a java.net.SocksSocketImpl)
    at java.net.ServerSocket.implAccept(Unknown Source)
    at java.net.ServerSocket.accept(Unknown Source)
    at sun.management.jmxremote.LocalRMIServerSocketFactory$1.accept(Unknown Source)
    at sun.rmi.transport.tcp.TCPTransport$AcceptLoop.executeAcceptLoop(Unknown Source)
    at sun.rmi.transport.tcp.TCPTransport$AcceptLoop.run(Unknown Source)
    at java.lang.Thread.run(Unknown Source)
     
       Locked ownable synchronizers:
    - None
     
     
    "Thread-6" prio=6 tid=0x00000000079dd000 nid=0x1d38 runnable [0x0000000000000000]
       java.lang.Thread.State: RUNNABLE
     
       Locked ownable synchronizers:
    - None
     
    "Thread-5" prio=6 tid=0x00000000079d6800 nid=0x1950 runnable [0x0000000000000000]
       java.lang.Thread.State: RUNNABLE
     
       Locked ownable synchronizers:
    - None
     
    "Thread-4" prio=6 tid=0x0000000008657800 nid=0x1744 runnable [0x0000000000000000]
       java.lang.Thread.State: RUNNABLE
     
       Locked ownable synchronizers:
    - None
     
    "Thread-3" prio=6 tid=0x000000000895e800 nid=0x2038 runnable [0x00000000092bf000]
       java.lang.Thread.State: RUNNABLE
    at ilog.cp.cppimpl.cp_wrapJNI.IloCP_next(Native Method)
    at ilog.cp.cppimpl.IloCP.next(IloCP.java:165)
    at ilog.cp.IloCP.next(IloCP.java:2347)
    at com.ibm.es.gpt.optim.rostering.assignment.engine.rostersassigner.patterns.RAPatternGenerator.cpoGeneratePatternsFromWildcard(RAPatternGenerator.java:256)
    at com.ibm.es.gpt.optim.rostering.assignment.engine.rostersassigner.patterns.RAPatternGenerator.generatePatternsFromWildcards(RAPatternGenerator.java:138)
    at com.ibm.es.gpt.optim.rostering.assignment.engine.rostersassigner.patterns.RAPatternGenerator.generate(RAPatternGenerator.java:76)
    at com.ibm.es.gpt.optim.rostering.assignment.engine.rostersassigner.RAEmployee.createPatterns(RAEmployee.java:208)
    at com.ibm.es.gpt.optim.rostering.assignment.engine.rostersassigner.RAEmployee.createOrReturnPatterns(RAEmployee.java:220)
    at com.ibm.es.gpt.optim.rostering.assignment.engine.rostersassigner.RAModel.getMaxPatternIndex(RAModel.java:757)
    at com.ibm.es.gpt.optim.rostering.assignment.engine.rostersassigner.RASolver.createPatternVariables(RASolver.java:317)
    at com.ibm.es.gpt.optim.rostering.assignment.engine.rostersassigner.RASolver.createProblem(RASolver.java:239)
    at com.ibm.es.gpt.optim.rostering.assignment.engine.rostersassigner.RASolver.solve(RASolver.java:163)
    at com.ibm.es.gpt.optim.rostering.assignment.engine.rostersassigner.RostersAssigner.calculate(RostersAssigner.java:28)
    at com.ibm.es.gpt.optim.rostering.assignment.engine.rostersassigner.junit.RostersAssignerTester.obtainAssignationResults(RostersAssignerTester.java:50)
    at com.ibm.es.gpt.optim.rostering.assignment.engine.rostersassigner.junit.ConcurrentDateManagementTest$RostersAssignerThread.run(ConcurrentDateManagementTest.java:52)
    at java.lang.Thread.run(Unknown Source)
     
       Locked ownable synchronizers:
    - None
     
    "Thread-2" prio=6 tid=0x000000000895e000 nid=0x1c24 runnable [0x00000000091be000]
       java.lang.Thread.State: RUNNABLE
    at ilog.cp.cppimpl.cp_wrapJNI.IloCP_next(Native Method)
    at ilog.cp.cppimpl.IloCP.next(IloCP.java:165)
    at ilog.cp.IloCP.next(IloCP.java:2347)
    at com.ibm.es.gpt.optim.rostering.assignment.engine.rostersassigner.patterns.RAPatternGenerator.cpoGeneratePatternsFromWildcard(RAPatternGenerator.java:256)
    at com.ibm.es.gpt.optim.rostering.assignment.engine.rostersassigner.patterns.RAPatternGenerator.generatePatternsFromWildcards(RAPatternGenerator.java:138)
    at com.ibm.es.gpt.optim.rostering.assignment.engine.rostersassigner.patterns.RAPatternGenerator.generate(RAPatternGenerator.java:76)
    at com.ibm.es.gpt.optim.rostering.assignment.engine.rostersassigner.RAEmployee.createPatterns(RAEmployee.java:208)
    at com.ibm.es.gpt.optim.rostering.assignment.engine.rostersassigner.RAEmployee.createOrReturnPatterns(RAEmployee.java:220)
    at com.ibm.es.gpt.optim.rostering.assignment.engine.rostersassigner.RAModel.getMaxPatternIndex(RAModel.java:757)
    at com.ibm.es.gpt.optim.rostering.assignment.engine.rostersassigner.RASolver.createPatternVariables(RASolver.java:317)
    at com.ibm.es.gpt.optim.rostering.assignment.engine.rostersassigner.RASolver.createProblem(RASolver.java:239)
    at com.ibm.es.gpt.optim.rostering.assignment.engine.rostersassigner.RASolver.solve(RASolver.java:163)
    at com.ibm.es.gpt.optim.rostering.assignment.engine.rostersassigner.RostersAssigner.calculate(RostersAssigner.java:28)
    at com.ibm.es.gpt.optim.rostering.assignment.engine.rostersassigner.junit.RostersAssignerTester.obtainAssignationResults(RostersAssignerTester.java:50)
    at com.ibm.es.gpt.optim.rostering.assignment.engine.rostersassigner.junit.ConcurrentDateManagementTest$RostersAssignerThread.run(ConcurrentDateManagementTest.java:52)
    at java.lang.Thread.run(Unknown Source)
     
       Locked ownable synchronizers:
    - None
     
    "Thread-1" prio=6 tid=0x00000000080ef000 nid=0x18e4 runnable [0x00000000077ee000]
       java.lang.Thread.State: RUNNABLE
    at ilog.cp.cppimpl.cp_wrapJNI.IloCP_next(Native Method)
    at ilog.cp.cppimpl.IloCP.next(IloCP.java:165)
    at ilog.cp.IloCP.next(IloCP.java:2347)
    at com.ibm.es.gpt.optim.rostering.assignment.engine.rostersassigner.patterns.RAPatternGenerator.cpoGeneratePatternsFromWildcard(RAPatternGenerator.java:256)
    at com.ibm.es.gpt.optim.rostering.assignment.engine.rostersassigner.patterns.RAPatternGenerator.generatePatternsFromWildcards(RAPatternGenerator.java:138)
    at com.ibm.es.gpt.optim.rostering.assignment.engine.rostersassigner.patterns.RAPatternGenerator.generate(RAPatternGenerator.java:76)
    at com.ibm.es.gpt.optim.rostering.assignment.engine.rostersassigner.RAEmployee.createPatterns(RAEmployee.java:208)
    at com.ibm.es.gpt.optim.rostering.assignment.engine.rostersassigner.RAEmployee.createOrReturnPatterns(RAEmployee.java:220)
    at com.ibm.es.gpt.optim.rostering.assignment.engine.rostersassigner.RAModel.getMaxPatternIndex(RAModel.java:757)
    at com.ibm.es.gpt.optim.rostering.assignment.engine.rostersassigner.RASolver.createPatternVariables(RASolver.java:317)
    at com.ibm.es.gpt.optim.rostering.assignment.engine.rostersassigner.RASolver.createProblem(RASolver.java:239)
    at com.ibm.es.gpt.optim.rostering.assignment.engine.rostersassigner.RASolver.solve(RASolver.java:163)
    at com.ibm.es.gpt.optim.rostering.assignment.engine.rostersassigner.RostersAssigner.calculate(RostersAssigner.java:28)
    at com.ibm.es.gpt.optim.rostering.assignment.engine.rostersassigner.junit.RostersAssignerTester.obtainAssignationResults(RostersAssignerTester.java:50)
    at com.ibm.es.gpt.optim.rostering.assignment.engine.rostersassigner.junit.ConcurrentDateManagementTest$RostersAssignerThread.run(ConcurrentDateManagementTest.java:52)
    at java.lang.Thread.run(Unknown Source)
     
       Locked ownable synchronizers:
    - None
     
    "ReaderThread" prio=6 tid=0x0000000006968000 nid=0x2028 runnable [0x00000000075af000]
       java.lang.Thread.State: RUNNABLE
    at java.net.SocketInputStream.socketRead0(Native Method)
    at java.net.SocketInputStream.read(Unknown Source)
    at sun.nio.cs.StreamDecoder.readBytes(Unknown Source)
    at sun.nio.cs.StreamDecoder.implRead(Unknown Source)
    at sun.nio.cs.StreamDecoder.read(Unknown Source)
    - locked <0x00000007d6e04638> (a java.io.InputStreamReader)
    at java.io.InputStreamReader.read(Unknown Source)
    at java.io.BufferedReader.fill(Unknown Source)
    at java.io.BufferedReader.readLine(Unknown Source)
    - locked <0x00000007d6e04638> (a java.io.InputStreamReader)
    at java.io.BufferedReader.readLine(Unknown Source)
    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner$ReaderThread.run(RemoteTestRunner.java:140)
     
       Locked ownable synchronizers:
    - None
     
    "Low Memory Detector" daemon prio=6 tid=0x000000000681e000 nid=0x2160 runnable [0x0000000000000000]
       java.lang.Thread.State: RUNNABLE
     
       Locked ownable synchronizers:
    - None
     
    "C2 CompilerThread1" daemon prio=10 tid=0x0000000006819800 nid=0x1ec4 waiting on condition [0x0000000000000000]
       java.lang.Thread.State: RUNNABLE
     
       Locked ownable synchronizers:
    - None
     
    "C2 CompilerThread0" daemon prio=10 tid=0x000000000680d800 nid=0x22b8 waiting on condition [0x0000000000000000]
       java.lang.Thread.State: RUNNABLE
     
       Locked ownable synchronizers:
    - None
     
    "Attach Listener" daemon prio=10 tid=0x000000000680a800 nid=0x22fc waiting on condition [0x0000000000000000]
       java.lang.Thread.State: RUNNABLE
     
       Locked ownable synchronizers:
    - None
     
    "Signal Dispatcher" daemon prio=10 tid=0x0000000006805000 nid=0x1c5c runnable [0x0000000000000000]
       java.lang.Thread.State: RUNNABLE
     
       Locked ownable synchronizers:
    - None
     
    "Finalizer" daemon prio=8 tid=0x00000000067b5800 nid=0x47c in Object.wait() [0x0000000006aaf000]
       java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    - waiting on <0x00000007d6cd7660> (a java.lang.ref.ReferenceQueue$Lock)
    at java.lang.ref.ReferenceQueue.remove(Unknown Source)
    - locked <0x00000007d6cd7660> (a java.lang.ref.ReferenceQueue$Lock)
    at java.lang.ref.ReferenceQueue.remove(Unknown Source)
    at java.lang.ref.Finalizer$FinalizerThread.run(Unknown Source)
     
       Locked ownable synchronizers:
    - None
     
    "Reference Handler" daemon prio=10 tid=0x0000000000527800 nid=0x22a8 in Object.wait() [0x00000000067af000]
       java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    - waiting on <0x00000007d6cd75f8> (a java.lang.ref.Reference$Lock)
    at java.lang.Object.wait(Object.java:485)
    at java.lang.ref.Reference$ReferenceHandler.run(Unknown Source)
    - locked <0x00000007d6cd75f8> (a java.lang.ref.Reference$Lock)
     
       Locked ownable synchronizers:
    - None
     
    "main" prio=6 tid=0x00000000003ad000 nid=0x1ab4 in Object.wait() [0x00000000024ce000]
       java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    - waiting on <0x00000007d6db02a8> (a java.lang.Thread)
    at java.lang.Thread.join(Unknown Source)
    - locked <0x00000007d6db02a8> (a java.lang.Thread)
    at java.lang.Thread.join(Unknown Source)
    at com.ibm.es.gpt.optim.rostering.assignment.engine.rostersassigner.junit.ConcurrentDateManagementTest.testParallelPlanningWithDifferentCalendars(ConcurrentDateManagementTest.java:77)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

    [...]

    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)
     
       Locked ownable synchronizers:
    - None
     
    "VM Thread" prio=10 tid=0x000000000051f800 nid=0x1c54 runnable 
     
    "GC task thread#0 (ParallelGC)" prio=6 tid=0x0000000000478000 nid=0x236c runnable 
     
    "GC task thread#1 (ParallelGC)" prio=6 tid=0x000000000047b000 nid=0x1c7c runnable 
     
    "GC task thread#2 (ParallelGC)" prio=6 tid=0x000000000047d000 nid=0x19f8 runnable 
     
    "GC task thread#3 (ParallelGC)" prio=6 tid=0x000000000047e800 nid=0x1d60 runnable 
     
    "VM Periodic Task Thread" prio=10 tid=0x0000000006827000 nid=0x1a94 waiting on condition 
     

    --------------------------------------- END OF THREAD DUMP ---------------------------------------------------------

     

     

    First of all sorry for such a long dump, but it has most of the info needed.

    Most of the threads are JVM daemon threads and GC threads but "Thread-1", "Thread-2" and "Thread-3" are the application threads that accessed the CP Optimizare libraries and never got back to executing. Those 3 threads don't own any lock and they are in RUNNABLE state, but still, they don't get back to execute. Thread "main" is the JUnit thread waiting for the join of the other 3 threads.

     

    Thanks in advance.

    Updated on 2013-04-25T11:23:56Z at 2013-04-25T11:23:56Z by JorgeGarcíaCastillo
  • rdumeur
    rdumeur
    109 Posts

    Re: Mutex Exception in multiple threads execution

    ‏2013-04-25T14:41:12Z  

    Hi,

    Sorry for the late reply. I tried your code and it is true that it doesn't hang. But mine still does, the only different between your code and mine is that runs under JUnit.

    Making furher profiling with jvisualvm I found the following thread stack (where the code hungs):

     

     

    --------------------------------------- START OF THREAD DUMP ---------------------------------------------------------

     

    Full thread dump Java HotSpot(TM) 64-Bit Server VM (20.14-b01 mixed mode):

     
    "RMI TCP Connection(2)-9.172.149.96" daemon prio=6 tid=0x0000000008281000 nid=0x2354 runnable [0x000000000bdbf000]
       java.lang.Thread.State: RUNNABLE
    at java.net.SocketInputStream.socketRead0(Native Method)
    at java.net.SocketInputStream.read(Unknown Source)
    at java.io.BufferedInputStream.fill(Unknown Source)
    at java.io.BufferedInputStream.read(Unknown Source)
    - locked <0x00000007d6913600> (a java.io.BufferedInputStream)
    at java.io.FilterInputStream.read(Unknown Source)
    at sun.rmi.transport.tcp.TCPTransport.handleMessages(Unknown Source)
    at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(Unknown Source)
    at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.lang.Thread.run(Unknown Source)
     
       Locked ownable synchronizers:
    - <0x00000007d66b2ec8> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
     
    "JMX server connection timeout 29" daemon prio=6 tid=0x0000000008280000 nid=0x1e3c in Object.wait() [0x000000000bcbf000]
       java.lang.Thread.State: TIMED_WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    - waiting on <0x00000007d67979a0> (a [I)
    at com.sun.jmx.remote.internal.ServerCommunicatorAdmin$Timeout.run(Unknown Source)
    - locked <0x00000007d67979a0> (a [I)
    at java.lang.Thread.run(Unknown Source)
     
       Locked ownable synchronizers:
    - None
     
    "RMI Scheduler(0)" daemon prio=6 tid=0x000000000827f800 nid=0x2244 waiting on condition [0x000000000bbbf000]
       java.lang.Thread.State: TIMED_WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for  <0x00000007d6693a88> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    at java.util.concurrent.locks.LockSupport.parkNanos(Unknown Source)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(Unknown Source)
    at java.util.concurrent.DelayQueue.take(Unknown Source)
    at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(Unknown Source)
    at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor.getTask(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.lang.Thread.run(Unknown Source)
     
       Locked ownable synchronizers:
    - None
     
    "RMI TCP Connection(1)-9.172.149.96" daemon prio=6 tid=0x000000000827e800 nid=0xd44 runnable [0x000000000babf000]
       java.lang.Thread.State: RUNNABLE
    at java.net.SocketInputStream.socketRead0(Native Method)
    at java.net.SocketInputStream.read(Unknown Source)
    at java.io.BufferedInputStream.fill(Unknown Source)
    at java.io.BufferedInputStream.read(Unknown Source)
    - locked <0x00000007d671c530> (a java.io.BufferedInputStream)
    at java.io.FilterInputStream.read(Unknown Source)
    at sun.rmi.transport.tcp.TCPTransport.handleMessages(Unknown Source)
    at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(Unknown Source)
    at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.lang.Thread.run(Unknown Source)
     
       Locked ownable synchronizers:
    - <0x00000007d66ae7f8> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
     
    "RMI TCP Accept-0" daemon prio=6 tid=0x000000000827e000 nid=0x20dc runnable [0x000000000b9bf000]
       java.lang.Thread.State: RUNNABLE
    at java.net.PlainSocketImpl.socketAccept(Native Method)
    at java.net.PlainSocketImpl.accept(Unknown Source)
    - locked <0x00000007d669a060> (a java.net.SocksSocketImpl)
    at java.net.ServerSocket.implAccept(Unknown Source)
    at java.net.ServerSocket.accept(Unknown Source)
    at sun.management.jmxremote.LocalRMIServerSocketFactory$1.accept(Unknown Source)
    at sun.rmi.transport.tcp.TCPTransport$AcceptLoop.executeAcceptLoop(Unknown Source)
    at sun.rmi.transport.tcp.TCPTransport$AcceptLoop.run(Unknown Source)
    at java.lang.Thread.run(Unknown Source)
     
       Locked ownable synchronizers:
    - None
     
     
    "Thread-6" prio=6 tid=0x00000000079dd000 nid=0x1d38 runnable [0x0000000000000000]
       java.lang.Thread.State: RUNNABLE
     
       Locked ownable synchronizers:
    - None
     
    "Thread-5" prio=6 tid=0x00000000079d6800 nid=0x1950 runnable [0x0000000000000000]
       java.lang.Thread.State: RUNNABLE
     
       Locked ownable synchronizers:
    - None
     
    "Thread-4" prio=6 tid=0x0000000008657800 nid=0x1744 runnable [0x0000000000000000]
       java.lang.Thread.State: RUNNABLE
     
       Locked ownable synchronizers:
    - None
     
    "Thread-3" prio=6 tid=0x000000000895e800 nid=0x2038 runnable [0x00000000092bf000]
       java.lang.Thread.State: RUNNABLE
    at ilog.cp.cppimpl.cp_wrapJNI.IloCP_next(Native Method)
    at ilog.cp.cppimpl.IloCP.next(IloCP.java:165)
    at ilog.cp.IloCP.next(IloCP.java:2347)
    at com.ibm.es.gpt.optim.rostering.assignment.engine.rostersassigner.patterns.RAPatternGenerator.cpoGeneratePatternsFromWildcard(RAPatternGenerator.java:256)
    at com.ibm.es.gpt.optim.rostering.assignment.engine.rostersassigner.patterns.RAPatternGenerator.generatePatternsFromWildcards(RAPatternGenerator.java:138)
    at com.ibm.es.gpt.optim.rostering.assignment.engine.rostersassigner.patterns.RAPatternGenerator.generate(RAPatternGenerator.java:76)
    at com.ibm.es.gpt.optim.rostering.assignment.engine.rostersassigner.RAEmployee.createPatterns(RAEmployee.java:208)
    at com.ibm.es.gpt.optim.rostering.assignment.engine.rostersassigner.RAEmployee.createOrReturnPatterns(RAEmployee.java:220)
    at com.ibm.es.gpt.optim.rostering.assignment.engine.rostersassigner.RAModel.getMaxPatternIndex(RAModel.java:757)
    at com.ibm.es.gpt.optim.rostering.assignment.engine.rostersassigner.RASolver.createPatternVariables(RASolver.java:317)
    at com.ibm.es.gpt.optim.rostering.assignment.engine.rostersassigner.RASolver.createProblem(RASolver.java:239)
    at com.ibm.es.gpt.optim.rostering.assignment.engine.rostersassigner.RASolver.solve(RASolver.java:163)
    at com.ibm.es.gpt.optim.rostering.assignment.engine.rostersassigner.RostersAssigner.calculate(RostersAssigner.java:28)
    at com.ibm.es.gpt.optim.rostering.assignment.engine.rostersassigner.junit.RostersAssignerTester.obtainAssignationResults(RostersAssignerTester.java:50)
    at com.ibm.es.gpt.optim.rostering.assignment.engine.rostersassigner.junit.ConcurrentDateManagementTest$RostersAssignerThread.run(ConcurrentDateManagementTest.java:52)
    at java.lang.Thread.run(Unknown Source)
     
       Locked ownable synchronizers:
    - None
     
    "Thread-2" prio=6 tid=0x000000000895e000 nid=0x1c24 runnable [0x00000000091be000]
       java.lang.Thread.State: RUNNABLE
    at ilog.cp.cppimpl.cp_wrapJNI.IloCP_next(Native Method)
    at ilog.cp.cppimpl.IloCP.next(IloCP.java:165)
    at ilog.cp.IloCP.next(IloCP.java:2347)
    at com.ibm.es.gpt.optim.rostering.assignment.engine.rostersassigner.patterns.RAPatternGenerator.cpoGeneratePatternsFromWildcard(RAPatternGenerator.java:256)
    at com.ibm.es.gpt.optim.rostering.assignment.engine.rostersassigner.patterns.RAPatternGenerator.generatePatternsFromWildcards(RAPatternGenerator.java:138)
    at com.ibm.es.gpt.optim.rostering.assignment.engine.rostersassigner.patterns.RAPatternGenerator.generate(RAPatternGenerator.java:76)
    at com.ibm.es.gpt.optim.rostering.assignment.engine.rostersassigner.RAEmployee.createPatterns(RAEmployee.java:208)
    at com.ibm.es.gpt.optim.rostering.assignment.engine.rostersassigner.RAEmployee.createOrReturnPatterns(RAEmployee.java:220)
    at com.ibm.es.gpt.optim.rostering.assignment.engine.rostersassigner.RAModel.getMaxPatternIndex(RAModel.java:757)
    at com.ibm.es.gpt.optim.rostering.assignment.engine.rostersassigner.RASolver.createPatternVariables(RASolver.java:317)
    at com.ibm.es.gpt.optim.rostering.assignment.engine.rostersassigner.RASolver.createProblem(RASolver.java:239)
    at com.ibm.es.gpt.optim.rostering.assignment.engine.rostersassigner.RASolver.solve(RASolver.java:163)
    at com.ibm.es.gpt.optim.rostering.assignment.engine.rostersassigner.RostersAssigner.calculate(RostersAssigner.java:28)
    at com.ibm.es.gpt.optim.rostering.assignment.engine.rostersassigner.junit.RostersAssignerTester.obtainAssignationResults(RostersAssignerTester.java:50)
    at com.ibm.es.gpt.optim.rostering.assignment.engine.rostersassigner.junit.ConcurrentDateManagementTest$RostersAssignerThread.run(ConcurrentDateManagementTest.java:52)
    at java.lang.Thread.run(Unknown Source)
     
       Locked ownable synchronizers:
    - None
     
    "Thread-1" prio=6 tid=0x00000000080ef000 nid=0x18e4 runnable [0x00000000077ee000]
       java.lang.Thread.State: RUNNABLE
    at ilog.cp.cppimpl.cp_wrapJNI.IloCP_next(Native Method)
    at ilog.cp.cppimpl.IloCP.next(IloCP.java:165)
    at ilog.cp.IloCP.next(IloCP.java:2347)
    at com.ibm.es.gpt.optim.rostering.assignment.engine.rostersassigner.patterns.RAPatternGenerator.cpoGeneratePatternsFromWildcard(RAPatternGenerator.java:256)
    at com.ibm.es.gpt.optim.rostering.assignment.engine.rostersassigner.patterns.RAPatternGenerator.generatePatternsFromWildcards(RAPatternGenerator.java:138)
    at com.ibm.es.gpt.optim.rostering.assignment.engine.rostersassigner.patterns.RAPatternGenerator.generate(RAPatternGenerator.java:76)
    at com.ibm.es.gpt.optim.rostering.assignment.engine.rostersassigner.RAEmployee.createPatterns(RAEmployee.java:208)
    at com.ibm.es.gpt.optim.rostering.assignment.engine.rostersassigner.RAEmployee.createOrReturnPatterns(RAEmployee.java:220)
    at com.ibm.es.gpt.optim.rostering.assignment.engine.rostersassigner.RAModel.getMaxPatternIndex(RAModel.java:757)
    at com.ibm.es.gpt.optim.rostering.assignment.engine.rostersassigner.RASolver.createPatternVariables(RASolver.java:317)
    at com.ibm.es.gpt.optim.rostering.assignment.engine.rostersassigner.RASolver.createProblem(RASolver.java:239)
    at com.ibm.es.gpt.optim.rostering.assignment.engine.rostersassigner.RASolver.solve(RASolver.java:163)
    at com.ibm.es.gpt.optim.rostering.assignment.engine.rostersassigner.RostersAssigner.calculate(RostersAssigner.java:28)
    at com.ibm.es.gpt.optim.rostering.assignment.engine.rostersassigner.junit.RostersAssignerTester.obtainAssignationResults(RostersAssignerTester.java:50)
    at com.ibm.es.gpt.optim.rostering.assignment.engine.rostersassigner.junit.ConcurrentDateManagementTest$RostersAssignerThread.run(ConcurrentDateManagementTest.java:52)
    at java.lang.Thread.run(Unknown Source)
     
       Locked ownable synchronizers:
    - None
     
    "ReaderThread" prio=6 tid=0x0000000006968000 nid=0x2028 runnable [0x00000000075af000]
       java.lang.Thread.State: RUNNABLE
    at java.net.SocketInputStream.socketRead0(Native Method)
    at java.net.SocketInputStream.read(Unknown Source)
    at sun.nio.cs.StreamDecoder.readBytes(Unknown Source)
    at sun.nio.cs.StreamDecoder.implRead(Unknown Source)
    at sun.nio.cs.StreamDecoder.read(Unknown Source)
    - locked <0x00000007d6e04638> (a java.io.InputStreamReader)
    at java.io.InputStreamReader.read(Unknown Source)
    at java.io.BufferedReader.fill(Unknown Source)
    at java.io.BufferedReader.readLine(Unknown Source)
    - locked <0x00000007d6e04638> (a java.io.InputStreamReader)
    at java.io.BufferedReader.readLine(Unknown Source)
    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner$ReaderThread.run(RemoteTestRunner.java:140)
     
       Locked ownable synchronizers:
    - None
     
    "Low Memory Detector" daemon prio=6 tid=0x000000000681e000 nid=0x2160 runnable [0x0000000000000000]
       java.lang.Thread.State: RUNNABLE
     
       Locked ownable synchronizers:
    - None
     
    "C2 CompilerThread1" daemon prio=10 tid=0x0000000006819800 nid=0x1ec4 waiting on condition [0x0000000000000000]
       java.lang.Thread.State: RUNNABLE
     
       Locked ownable synchronizers:
    - None
     
    "C2 CompilerThread0" daemon prio=10 tid=0x000000000680d800 nid=0x22b8 waiting on condition [0x0000000000000000]
       java.lang.Thread.State: RUNNABLE
     
       Locked ownable synchronizers:
    - None
     
    "Attach Listener" daemon prio=10 tid=0x000000000680a800 nid=0x22fc waiting on condition [0x0000000000000000]
       java.lang.Thread.State: RUNNABLE
     
       Locked ownable synchronizers:
    - None
     
    "Signal Dispatcher" daemon prio=10 tid=0x0000000006805000 nid=0x1c5c runnable [0x0000000000000000]
       java.lang.Thread.State: RUNNABLE
     
       Locked ownable synchronizers:
    - None
     
    "Finalizer" daemon prio=8 tid=0x00000000067b5800 nid=0x47c in Object.wait() [0x0000000006aaf000]
       java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    - waiting on <0x00000007d6cd7660> (a java.lang.ref.ReferenceQueue$Lock)
    at java.lang.ref.ReferenceQueue.remove(Unknown Source)
    - locked <0x00000007d6cd7660> (a java.lang.ref.ReferenceQueue$Lock)
    at java.lang.ref.ReferenceQueue.remove(Unknown Source)
    at java.lang.ref.Finalizer$FinalizerThread.run(Unknown Source)
     
       Locked ownable synchronizers:
    - None
     
    "Reference Handler" daemon prio=10 tid=0x0000000000527800 nid=0x22a8 in Object.wait() [0x00000000067af000]
       java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    - waiting on <0x00000007d6cd75f8> (a java.lang.ref.Reference$Lock)
    at java.lang.Object.wait(Object.java:485)
    at java.lang.ref.Reference$ReferenceHandler.run(Unknown Source)
    - locked <0x00000007d6cd75f8> (a java.lang.ref.Reference$Lock)
     
       Locked ownable synchronizers:
    - None
     
    "main" prio=6 tid=0x00000000003ad000 nid=0x1ab4 in Object.wait() [0x00000000024ce000]
       java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    - waiting on <0x00000007d6db02a8> (a java.lang.Thread)
    at java.lang.Thread.join(Unknown Source)
    - locked <0x00000007d6db02a8> (a java.lang.Thread)
    at java.lang.Thread.join(Unknown Source)
    at com.ibm.es.gpt.optim.rostering.assignment.engine.rostersassigner.junit.ConcurrentDateManagementTest.testParallelPlanningWithDifferentCalendars(ConcurrentDateManagementTest.java:77)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

    [...]

    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)
     
       Locked ownable synchronizers:
    - None
     
    "VM Thread" prio=10 tid=0x000000000051f800 nid=0x1c54 runnable 
     
    "GC task thread#0 (ParallelGC)" prio=6 tid=0x0000000000478000 nid=0x236c runnable 
     
    "GC task thread#1 (ParallelGC)" prio=6 tid=0x000000000047b000 nid=0x1c7c runnable 
     
    "GC task thread#2 (ParallelGC)" prio=6 tid=0x000000000047d000 nid=0x19f8 runnable 
     
    "GC task thread#3 (ParallelGC)" prio=6 tid=0x000000000047e800 nid=0x1d60 runnable 
     
    "VM Periodic Task Thread" prio=10 tid=0x0000000006827000 nid=0x1a94 waiting on condition 
     

    --------------------------------------- END OF THREAD DUMP ---------------------------------------------------------

     

     

    First of all sorry for such a long dump, but it has most of the info needed.

    Most of the threads are JVM daemon threads and GC threads but "Thread-1", "Thread-2" and "Thread-3" are the application threads that accessed the CP Optimizare libraries and never got back to executing. Those 3 threads don't own any lock and they are in RUNNABLE state, but still, they don't get back to execute. Thread "main" is the JUnit thread waiting for the join of the other 3 threads.

     

    Thanks in advance.

    Dear JorgeGarciaCastillo

    A first reason why these thread may not terminate is that they do not have any search limit.  Did you specify a search limit for these threads? If so, having a model to try reproducing your problem would help us very much. If not, please try specify a search limit and see whether the .next() methods returns or not.

    I hope this helps,

    Cheers,

     

     

  • JorgeGarcíaCastillo
    6 Posts

    Re: Mutex Exception in multiple threads execution

    ‏2013-04-25T16:01:14Z  
    • rdumeur
    • ‏2013-04-25T14:41:12Z

    Dear JorgeGarciaCastillo

    A first reason why these thread may not terminate is that they do not have any search limit.  Did you specify a search limit for these threads? If so, having a model to try reproducing your problem would help us very much. If not, please try specify a search limit and see whether the .next() methods returns or not.

    I hope this helps,

    Cheers,

     

     

    Thanks for the answer,

    I'm afraid that the size of the search could not be the problem in this case. The evidences that we have is tha CPU usage is really low and the threads are in RUNNABLE state waiting to get processor time, but not searching for the solution as the model is really small and simple.

    This problem does not arises introducing certaing delay between threads. Still, this deadlock in a server environment can cause much problem.

    Kind regards.

    Jorge

     

  • rdumeur
    rdumeur
    109 Posts

    Re: Mutex Exception in multiple threads execution

    ‏2013-04-25T16:43:08Z  

    Thanks for the answer,

    I'm afraid that the size of the search could not be the problem in this case. The evidences that we have is tha CPU usage is really low and the threads are in RUNNABLE state waiting to get processor time, but not searching for the solution as the model is really small and simple.

    This problem does not arises introducing certaing delay between threads. Still, this deadlock in a server environment can cause much problem.

    Kind regards.

    Jorge

     

    Dear Jorge,

    Does your code use custom, java-written variable or value choosers used in search phases?

    Cheers,

  • rdumeur
    rdumeur
    109 Posts

    Re: Mutex Exception in multiple threads execution

    ‏2013-04-26T12:59:42Z  
    • rdumeur
    • ‏2013-04-25T16:43:08Z

    Dear Jorge,

    Does your code use custom, java-written variable or value choosers used in search phases?

    Cheers,

    Dear Jorge,

    Also, does your application follows the same pattern as the small example I've previously sent? That is, one thread creates an IloCP, loads a model, calls solves then calls end() on the solver before exiting. Could you please confirm that this is the case?

    Cheers,

  • JorgeGarcíaCastillo
    6 Posts

    Re: Mutex Exception in multiple threads execution

    ‏2013-04-29T16:50:33Z  
    • rdumeur
    • ‏2013-04-26T12:59:42Z

    Dear Jorge,

    Also, does your application follows the same pattern as the small example I've previously sent? That is, one thread creates an IloCP, loads a model, calls solves then calls end() on the solver before exiting. Could you please confirm that this is the case?

    Cheers,

    Hello,

     

    Answering both questions:

    1. Custom variable/value chooser in the model?

    Yes, we do have custom variable and value chooser to solve the main CP problem. We also build a smaller CP problem to get domain of the vars in the main problem, and this smaller model does not use custom value/variable choosers.

    We are getting the deadlock in this smaller problem to "populate" the main one so at this point custom choosers are not being used yet.

     

    2. Same pattern as your example?

    Yes, each thread creates their own IloCP, solves the model and calles end() after getting the solution. The following function is the one where the deadlock happens:

     

    private ArrayList<RAPattern> cpoGeneratePatternsFromWildcard(WildcardPattern positive, WildcardPattern[] negatives, Set<ShiftType> shifts, RAEmployee employee) throws IloException {
       
    // Find the valid patterns
    ArrayList<RAPattern> patterns = new ArrayList<RAPattern>();
    IloCP cpo = new IloCP();
    if (logger.isEnabledFor(Level.TRACE)){
        cpo.setOut(System.out);
    } else{
        cpo.setOut(null);
    }
     

     

    IloIntVar[] x = new IloIntVar[positive.getLength()];
    for (int d = 0; d < x.length; d++) {
           x[d] = cpo.intVar(values, new String("X" + Integer.toString(d)));
    }
    // adds IloIntVars and IloConstraints to the model
    addPositivePatternsVariables(cpo,x,positive,)
     
    //adds IloOr and IloConstraint to the model
    addNegativePatternsVariables(cpo,x,negative)
         
           
     cpo.setParameter(IloCP.IntParam.SearchType, IloCP.ParameterValues.DepthFirst);
     cpo.setParameter(IloCP.IntParam.LogVerbosity, IloCP.ParameterValues.Quiet);
     
     IloSearchPhase sp = cpo.searchPhase(x);
     cpo.startNewSearch(sp);
     int nbSol = 0;
     while ( cpo.next() ){     <-----------     IN THIS CALL, THREE THREADS GET STUCK
           nbSol++;
           ShiftType[] shiftTypes = new ShiftType[positiveLength];
           for(int j = 0; j < positiveLength; j++){
                int value = cpo.getIntValue(x[j]);
                List<ShiftType> dayShifts = positive.getShiftTypes(j);
                for (Iterator<ShiftType> iterEntries = dayShifts.iterator(); iterEntries.hasNext();) {
                        ShiftType entry = iterEntries.next();
                        if(entry.getIdInteger()==value){
                            shiftTypes[j] = entry;
                            break;
                        }
                    }
                }//j
     
         
            RAPattern raPattern = new RAPattern(new Pattern(shiftTypes, positive.getStartPosition(), positive.getWeight()),
            copyFeature(positivesNights.get(positive)),isFreeWKPAttern,isWorkWKPAttern,positive ); 
     
            patterns.add(raPattern);
           
        }
     
        cpo.end();
        return patterns
    }

     

    The line that I marked is the one where the threads from the thread dump are stopped.

     

    Thank you very much for the interest. Hope we find a solution.

    Regards.

    Updated on 2013-04-29T16:55:31Z at 2013-04-29T16:55:31Z by JorgeGarcíaCastillo
  • rdumeur
    rdumeur
    109 Posts

    Re: Mutex Exception in multiple threads execution

    ‏2013-04-30T09:23:38Z  

    Hello,

     

    Answering both questions:

    1. Custom variable/value chooser in the model?

    Yes, we do have custom variable and value chooser to solve the main CP problem. We also build a smaller CP problem to get domain of the vars in the main problem, and this smaller model does not use custom value/variable choosers.

    We are getting the deadlock in this smaller problem to "populate" the main one so at this point custom choosers are not being used yet.

     

    2. Same pattern as your example?

    Yes, each thread creates their own IloCP, solves the model and calles end() after getting the solution. The following function is the one where the deadlock happens:

     

    private ArrayList<RAPattern> cpoGeneratePatternsFromWildcard(WildcardPattern positive, WildcardPattern[] negatives, Set<ShiftType> shifts, RAEmployee employee) throws IloException {
       
    // Find the valid patterns
    ArrayList<RAPattern> patterns = new ArrayList<RAPattern>();
    IloCP cpo = new IloCP();
    if (logger.isEnabledFor(Level.TRACE)){
        cpo.setOut(System.out);
    } else{
        cpo.setOut(null);
    }
     

     

    IloIntVar[] x = new IloIntVar[positive.getLength()];
    for (int d = 0; d < x.length; d++) {
           x[d] = cpo.intVar(values, new String("X" + Integer.toString(d)));
    }
    // adds IloIntVars and IloConstraints to the model
    addPositivePatternsVariables(cpo,x,positive,)
     
    //adds IloOr and IloConstraint to the model
    addNegativePatternsVariables(cpo,x,negative)
         
           
     cpo.setParameter(IloCP.IntParam.SearchType, IloCP.ParameterValues.DepthFirst);
     cpo.setParameter(IloCP.IntParam.LogVerbosity, IloCP.ParameterValues.Quiet);
     
     IloSearchPhase sp = cpo.searchPhase(x);
     cpo.startNewSearch(sp);
     int nbSol = 0;
     while ( cpo.next() ){     <-----------     IN THIS CALL, THREE THREADS GET STUCK
           nbSol++;
           ShiftType[] shiftTypes = new ShiftType[positiveLength];
           for(int j = 0; j < positiveLength; j++){
                int value = cpo.getIntValue(x[j]);
                List<ShiftType> dayShifts = positive.getShiftTypes(j);
                for (Iterator<ShiftType> iterEntries = dayShifts.iterator(); iterEntries.hasNext();) {
                        ShiftType entry = iterEntries.next();
                        if(entry.getIdInteger()==value){
                            shiftTypes[j] = entry;
                            break;
                        }
                    }
                }//j
     
         
            RAPattern raPattern = new RAPattern(new Pattern(shiftTypes, positive.getStartPosition(), positive.getWeight()),
            copyFeature(positivesNights.get(positive)),isFreeWKPAttern,isWorkWKPAttern,positive ); 
     
            patterns.add(raPattern);
           
        }
     
        cpo.end();
        return patterns
    }

     

    The line that I marked is the one where the threads from the thread dump are stopped.

     

    Thank you very much for the interest. Hope we find a solution.

    Regards.

    Dear Jorge,

    I am still trying to replicate your problem.

    Initially you thought that the deadlock occurred in the IloCP creation and didn't set the "Workers" parameters to 1.

    Now that we see the deadlock occurs in the .next() call, could you please try this setting? I'd like to have an idea of where the problem lies exactly.

    Cheers,

  • rdumeur
    rdumeur
    109 Posts

    Re: Mutex Exception in multiple threads execution

    ‏2013-04-30T10:28:29Z  
    • rdumeur
    • ‏2013-04-30T09:23:38Z

    Dear Jorge,

    I am still trying to replicate your problem.

    Initially you thought that the deadlock occurred in the IloCP creation and didn't set the "Workers" parameters to 1.

    Now that we see the deadlock occurs in the .next() call, could you please try this setting? I'd like to have an idea of where the problem lies exactly.

    Cheers,

    Hi Jorge,

     

    I think I can reproduce your trouble:

    - in cos 12.5, running multiple threads each solving with Workers > 1 reaches a dead lock. Workers = 1 won't block.

    - in cos 12.5.0.1, this bug has been fixed and the deadlock shouldn't happen.

    Since 12.5.0.1 is a fix release  that contains other bug fixes, I strongly encourage you to try it!

    Please let me know what you get when you run solvers with Workers=1 using cos12.5.

    Cheers!

     

  • rdumeur
    rdumeur
    109 Posts

    Re: Mutex Exception in multiple threads execution

    ‏2013-04-30T10:51:00Z  
    I believe that I am encountering the same issue as Jorge.

    Today I started running my unit tests in parallel with a setting of 4 concurrent test(my test suite takes about 15 minutes otherwise). About 1 out of 3 runs of my test suite ends up stuck. There is no action on the CPU, with a profiler I can see the same stack trace that Jorge posted originally.

    My environment is:
    Windows 7 64 bit
    4 Core Machine with Hyperthreading (looks like 8 cores in Windows)
    Java HotSpot(TM) 64-Bit Server VM; 1.6.0_24; 19.1-b02; mixed mode
    CPLEX 12.4

    My configuration parameter is set to only give 1 Worker thread per solve.

    We also intend to run this in a multi-threaded environment.

    I can test more on Monday with 12.5. We intend to upgrade to 12.5 soon and I will report back with any finding.

    Steve

    Dear Steve,

    I could eventually reproduce Jorge's problem on 12.5 and provided possible solutions. Could you please tell me what result you obtained with 12.5?

    Thank you in advance,

    Cheers,

  • Steve_Ronderos
    Steve_Ronderos
    2 Posts

    Re: Mutex Exception in multiple threads execution

    ‏2013-04-30T13:49:40Z  
    • rdumeur
    • ‏2013-04-30T10:51:00Z

    Dear Steve,

    I could eventually reproduce Jorge's problem on 12.5 and provided possible solutions. Could you please tell me what result you obtained with 12.5?

    Thank you in advance,

    Cheers,

    Hi Renaud,

    I am still running into the issue in 12.5.  We are using CPLEX in a Web Application that is using Message Driven Beans (I believe that is all irrelevant to the problem, but just to provide more context).  The deadlock is occurring fairly consistently when many (up to 4) messages (which create a new IloCP object each) are sent simultaneously.  

    I'll try out the 12.5.0.1 fix pack.  Could you please direct me to the actual APAR that fixes this issue?

    Thanks,

    Steve

  • JorgeGarcíaCastillo
    6 Posts

    Re: Mutex Exception in multiple threads execution

    ‏2013-04-30T15:31:28Z  
    • rdumeur
    • ‏2013-04-30T10:28:29Z

    Hi Jorge,

     

    I think I can reproduce your trouble:

    - in cos 12.5, running multiple threads each solving with Workers > 1 reaches a dead lock. Workers = 1 won't block.

    - in cos 12.5.0.1, this bug has been fixed and the deadlock shouldn't happen.

    Since 12.5.0.1 is a fix release  that contains other bug fixes, I strongly encourage you to try it!

    Please let me know what you get when you run solvers with Workers=1 using cos12.5.

    Cheers!

     

    Hi rdumeur,

    I solved it! The solution was setting the Workers parameter to 1 not only in the main IloCP object but also in the one that populates that one.

    It took me this long to solved it because I thought that deacklock was happening in the main IloCP problem and I only set the threads number to 1 in the main problem. Setting the Workers paramter for the smaller problem solved the deadlock.

    Thanks a lot for the help!

     

    Regards

  • Steve_Ronderos
    Steve_Ronderos
    2 Posts

    Re: Mutex Exception in multiple threads execution

    ‏2013-04-30T17:55:06Z  

    Hi Renaud,

    I am still running into the issue in 12.5.  We are using CPLEX in a Web Application that is using Message Driven Beans (I believe that is all irrelevant to the problem, but just to provide more context).  The deadlock is occurring fairly consistently when many (up to 4) messages (which create a new IloCP object each) are sent simultaneously.  

    I'll try out the 12.5.0.1 fix pack.  Could you please direct me to the actual APAR that fixes this issue?

    Thanks,

    Steve

    Fix Pack 12.5.0.1 appears to have fixed my issue.

    I would still like to know the exact APAR that addressed the issue if you could direct me to it.

    Thanks,

    Steve

  • rdumeur
    rdumeur
    109 Posts

    Re: Mutex Exception in multiple threads execution

    ‏2013-05-02T08:03:20Z  

    Fix Pack 12.5.0.1 appears to have fixed my issue.

    I would still like to know the exact APAR that addressed the issue if you could direct me to it.

    Thanks,

    Steve

    Dear Steve,

     

    This is good news. I've asked for this APAR information and will get back to you as soon as I get it.

    Cheers,

  • rdumeur
    rdumeur
    109 Posts

    Re: Mutex Exception in multiple threads execution

    ‏2013-05-13T12:32:44Z  
    • rdumeur
    • ‏2013-05-02T08:03:20Z

    Dear Steve,

     

    This is good news. I've asked for this APAR information and will get back to you as soon as I get it.

    Cheers,

    Dear Steve,

     

    I've been told there is no APAR description available for this particular fix.

    Cheers,