IBM Support

IT38664: In IBM MQ RDQM, if a resource fails to start during boot then attempts to start the remaining DRBD resources are skipped

Subscribe to this APAR

By subscribing, you receive periodic emails alerting you to the status of the APAR, along with a link to the fix after it becomes available. You can track this item individually or track all items by product.

Notify me when this APAR changes.

Notify me when an APAR for this component changes.

 

APAR status

  • Closed as program error.

Error description

  • When an RDQM node boots up, systemd brings DRBD resources up. If
    any DRBD resources fail to come up, the remaining resources not
    attempted to be brought up and those are skipped.
    
    In the primary node of a DR RDQM environment, for example, the
    DRBD resources are started at boot by systemd by running the
    following commands in the following order:
    
        drbdadm up qm1
        drbdadm primary qm1
        drbdadm up qm2
        drbdadm primary qm2
        drbdadm up qm3
        drbdadm primary qm3
    ...
    
    The /var/log/messages will show the following similar logs:
    
    Oct 11 08:21:32 node01 kernel: drbd qm2: State change failed:
    Need access to UpToDate data
    Oct 11 08:21:32 node01 kernel: drbd qm2: Failed: role( Secondary
    -> Primary )
    Oct 11 08:21:32 node01 rdqmd: AMQ3817E: Replicated data
    subsystem call '/usr/sbin/drbdadm primary qm2'
    Oct 11 08:21:32 node01 rdqmd: failed with return code '17'.
    Oct 11 08:21:32 node01 rdqmd: qm2: State change failed: (-2)
    Need access to UpToDate data
    Oct 11 08:21:32 node01 rdqmd: Command 'drbdsetup primary qm2'
    terminated with exit code 17
    Oct 11 08:21:32 node01 rdqmd: AMQ3747E: Replicated data systemd
    initialization failed.
    Oct 11 08:21:32 node01 systemd: rdqm.service: main process
    exited, code=exited, status=71
    
    The result from the above actions leave QM1 running, but not QM2
    and QM3 as the DRBD service for QM2 failed to be brought up
    resulting in the failure to even try to bring up QM3.
    

Local fix

  • Manually bring up the remaining resources that weren't brought
    up by issuing the  "drbdadm up "  command.
    

Problem summary

  • ****************************************************************
    USERS AFFECTED:
    <span style="background-color:rgb(255, 255, 255)">Users running
    IBM MQ Queue manager in a RDQM high availability setup.</span>
    
    
    Platforms affected:
    Linux on x86-64
    
    ****************************************************************
    PROBLEM DESCRIPTION:
    After a reboot, RDQM systemd service can abandon bringing up
    other resources if it come across a failure while doing "drbdadm
    primary" for a prior node.
    

Problem conclusion

  • RDQM code has been modified so that even after an initial
    failure it will still attempt to bring up other resources.
    
    ---------------------------------------------------------------
    The fix is targeted for delivery in the following PTFs:
    
    Version    Maintenance Level
    v9.1 LTS   9.1.0.12
    v9.2 LTS   9.2.0.6
    v9.x CD    9.2.5
    
    The latest available maintenance can be obtained from
    'WebSphere MQ Recommended Fixes'
    http://www-1.ibm.com/support/docview.wss?rs=171&uid=swg27006037
    
    If the maintenance level is not yet available information on
    its planned availability can be found in 'WebSphere MQ
    Planned Maintenance Release Dates'
    http://www-1.ibm.com/support/docview.wss?rs=171&uid=swg27006309
    ---------------------------------------------------------------
    

Temporary fix

Comments

APAR Information

  • APAR number

    IT38664

  • Reported component name

    IBM MQ BASE MP

  • Reported component ID

    5724H7271

  • Reported release

    910

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2021-10-11

  • Closed date

    2022-04-28

  • Last modified date

    2022-06-21

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    IBM MQ BASE MP

  • Fixed component ID

    5724H7271

Applicable component levels

[{"Line of Business":{"code":"LOB45","label":"Automation"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSYHRD","label":"IBM MQ"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"910"}]

Document Information

Modified date:
22 June 2022