Topic
4 replies Latest Post - ‏2008-10-06T00:06:48Z by SystemAdmin
SystemAdmin
SystemAdmin
2340 Posts
ACCEPTED ANSWER

Pinned topic MQ workflow - messages in the hold queue

‏2006-10-19T04:03:37Z |
Hi All,
Last month end there were a lot of deadlocks occured in our production server hence hold queue got filled with messages.Next day I found 74 messages in the hold queue and I have replayed all the messages from the from the hold queue, this has successfully sent only 40 messages from the queue. Remaining 34 messages are still exists in the hold queue. However yesterday I have tried to replay only the first message, it has created exceptions and deadloacks and also restarted execution server instances. Here I have pasted the exception and deadlock details for more information.

Error message from fmcerr.log:


MQSeries Workflow 3.3 Error Report

Report creation = 10/18/06 15:46:08
Related message = FMC31050E An error has occurred which has terminated processing.
Error location = File=/projects/fmc/drvp/lbld/v332/src/fmcdtprc.cxx, Line=2887, Function=FmcTOMForBlocks::MaterializeForUpdateImpl2(const MaterializeInputParm &, const FmcBIID &, TomForAbstractBlocksDict::iterator &)
Error data = FmcTOMNotFoundException, KeyValues= BIID=OID(00000001153f00030000000000000000),OID(00000001415647490000000000000000)

Exceptions from fmcsys.log:


10/18/06 15:46:07 FMC12170I The processing of hold queue messages started.
10/18/06 15:46:07 FMC12150I 1 messages have been moved to the execution server input queue.
10/18/06 15:46:08 FMC31050E An error has occurred which has terminated processing.
10/18/06 15:46:08 FmcTOMNotFoundException, KeyValues= BIID=OID(00000001153f00030000000000000000),OID(00000001415647490000000000000000)10/18/06 15:46:08 FMC31050E An error has occurred which has terminated processing.
10/18/06 15:46:08 FmcTOMNotFoundException, KeyValues= BIID=OID(00000001153f00030000000000000000),OID(00000001415647490000000000000000)10/18/06 15:46:09 FMC12240E Execution server instance(s) stopped with an error.
10/18/06 15:46:09 FMC12240E Execution server instance(s) stopped with an error.
10/18/06 15:46:09 FMC31050E An error has occurred which has terminated processing.
10/18/06 15:46:09 FmcTOMNotFoundException, KeyValues= BIID=OID(00000001153f00030000000000000000),OID(00000001415647490000000000000000)10/18/06 15:46:09 FMC12240E Execution server instance(s) stopped with an error.
10/18/06 15:46:10 FMC10500I Execution server instance started.
10/18/06 15:46:10 FMC31050E An error has occurred which has terminated processing.
10/18/06 15:46:10 FmcTOMNotFoundException, KeyValues= BIID=OID(00000001153f00030000000000000000),OID(00000001415647490000000000000000)10/18/06 15:46:10 FMC12240E Execution server instance(s) stopped with an error.
10/18/06 15:46:10 FMC10500I Execution server instance started.
10/18/06 15:46:10 FMC31050E An error has occurred which has terminated processing.
10/18/06 15:46:10 FmcTOMNotFoundException, KeyValues= BIID=OID(00000001153f00030000000000000000),OID(00000001415647490000000000000000)10/18/06 15:46:10 FMC12240E Execution server instance(s) stopped with an error.
10/18/06 15:46:11 FMC10500I Execution server instance started.
10/18/06 15:46:11 FMC10500I Execution server instance started.
10/18/06 15:46:11 FMC10500I Execution server instance started.

deadlocks:


10/18/06 16:40:40 FMC31100W The message ResumeProcInst could not be processed because of a database deadlock or timeout. The message will be retried.
10/18/06 16:40:42 FMC31100W The message ResumeProcInst could not be processed because of a database deadlock or timeout. The message will be retried.
10/18/06 16:40:44 FMC31100W The message ResumeProcInst could not be processed because of a database deadlock or timeout. The message will be retried.
10/18/06 16:40:46 FMC31100W The message ResumeProcInst could not be processed because of a database deadlock or timeout. The message will be retried.
10/18/06 16:40:58 FMC31100W The message ResumeProcInst could not be processed because of a database deadlock or timeout. The message will be retried.
10/18/06 16:41:05 FMC31100W The message ResumeProcInst could not be processed because of a database deadlock or timeout. The message will be retried.
10/18/06 16:41:07 FMC31100W The message ResumeProcInst could not be processed because of a database deadlock or timeout. The message will be retried.
10/18/06 16:41:09 FMC31100W The message ResumeProcInst could not be processed because of a database deadlock or timeout. The message will be retried.

Can anyone tell me why the execution server instances restarted?
How to clear the messages from hold queue ?
What would happen if I delete message from hold queue using fmcautil?

Production server environment details:

AIX-4.3
MQ Workflow-3.3.2.0
MQ Series Server-5.2.0.0
DB2-06.01.0000

Thanks & Regards,
BMCA.

Updated on 2008-10-06T00:06:48Z at 2008-10-06T00:06:48Z by SystemAdmin
  • SystemAdmin
    SystemAdmin
    2340 Posts
    ACCEPTED ANSWER

    Re: MQ workflow - messages in the hold queue

    ‏2006-10-19T07:07:01Z  in response to SystemAdmin
    Oh dear, you are far away from a supported environment!!
    You should upgrade your system to MQWF 3.5 or 3.6 as soon as possible.
    In 3.3.2 there were several errors with respect to deadlocks that are
    fixed
    in later releases. To come out of this situation, you should try to replay
    each
    hold queue message step by step. Any message that refers to
    ResumeProcInst
    and fails with TOMNotFound can savely be deleted. It is a try to resume a
    process
    instance that does no longer exist in the runtimeDB (manually
    intervention?).

    Any other MessageType / Exception combination has to be analyzed in depth.
    Most probably you can figure out the pending request and fix the problem
    manually
    ( e.g. ForceRestart() ) and delete the corresponding HoldQueueMessage.
    You can also check your process instances / activities via a monitor or
    API program
    and fix any unusual states (Running forever, InErrror, etc.)

    After your system is in good shape, you should definitely plan for an MQWF
    Upgrade!

    GOOD LUCK!

    Volker Hoss
    IBM WPS Development
  • SystemAdmin
    SystemAdmin
    2340 Posts
    ACCEPTED ANSWER

    Re: MQ workflow - messages in the hold queue

    ‏2006-10-19T08:27:05Z  in response to SystemAdmin
    Thank you for your replay. Could tell me the reason behind the execution server instance restart?. I have decided to delete the message from the hold queue that is related to the ResumeProcessInst message.

    Thanks & Regards,
    BMCA
    • SystemAdmin
      SystemAdmin
      2340 Posts
      ACCEPTED ANSWER

      Re: MQ workflow - messages in the hold queue

      ‏2006-10-20T07:12:07Z  in response to SystemAdmin
      Well...

      A TOMNotFoundException results from a DB access and the object was not
      found in the DB.
      This is considered as a severe error because the server does not know how
      to continue navigation.
      In your case this most probably results from a program error that was
      fixed either in a later version or
      in a ServicePack. Cannot tell you more without support involvement. And
      that's the reason why I
      urgently recommend to move to a supported level.

      One last recommendation: never manipulate the MQWF DB manually to 'fix' a
      problem.
      Always use thr provided API / programs. Otherwise you risk a inconsistent
      / corrupted DB.

      Volker Hoss
      IBM WPS Development
      • SystemAdmin
        SystemAdmin
        2340 Posts
        ACCEPTED ANSWER

        Re: MQ workflow - messages in the hold queue

        ‏2008-10-06T00:06:48Z  in response to SystemAdmin
        I have the same error, and i had 2 changes before the error ocurrs:
        1. Move the CONTAINER table tablespace to other tablespace with another name and another disk
        2. Export, import the DB2 data to the new CONTAINER
        Does somewhere can comment this changes?