IBM Support

P101035: LIM RESTART OLD LIM DOESN'T EXIT, NEW LIM CANNOT START

 

APAR status

  • Closed as program error.

Error description

  • 1) At about Mar 19 04:06:31, or a little before this time point,
    a
    "lsadmin lim restart" was executed on master houcy1-n-sp099a02.
    2) But as master's load was high, the original lim had not
    exited, and
    not released the port. At meanwhile, the new lim failed to
    startup due
    to the port unavailable. So we see the messages in lim log as
    below:
    
    Mar 19 04:06:31 2015 46069 3 1.2.7 initSock():
    chanServSocketExt_(). A
    socket operation has failed on the configured UDP port <7869> on
    host
    <houcy1-n-sp099a02>. Reason: <Address already in use>. Fatal
    error.
    Either change the port number in lsf.conf (LSF_LIM_PORT) or
    terminate
    the other process that is bound to the port.
    Mar 19 04:06:31 2015 46069 3 1.2.7 initSock: LIM has exited due
    to a
    fatal error.
    
    3) As the lim was in abnormal status on master, a failover
    occurred.
    During the failover, a job submission was committed, but it
    failed with
    messages as below:
    LSF is down. Please wait ...
    
    Connection refused by server. Job not submitted.
    

Local fix

  • n/a
    

Problem summary

  • When the system is too busy to release a port, it will cause the
     lim/res restart to fail because the socket failed to
    initialize.
    
    This fix introduces the following parameters in lsf.conf to
    control the retry behavior:
    

Problem conclusion

  • fix it
    

Temporary fix

Comments

APAR Information

  • APAR number

    P101035

  • Reported component name

    LSF STD LEGACY

  • Reported component ID

    5725G8206

  • Reported release

    911

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt

  • Submitted date

    2015-04-01

  • Closed date

    2015-06-17

  • Last modified date

    2015-06-17

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    LSF STD LEGACY

  • Fixed component ID

    5725G8206

Applicable component levels

  • R911 PSY

       UP

[{"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Product":{"code":"SSWRJV","label":"IBM Spectrum LSF"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"911","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}},{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSETD4","label":"Platform LSF"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"911","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Document Information

Modified date:
17 June 2015