Topic
  • 6 replies
  • Latest Post - ‏2013-02-14T14:05:41Z by SystemAdmin
SystemAdmin
SystemAdmin
194 Posts

Pinned topic TWS for z/OS dynamic agent goes offline in TWS Master

‏2013-02-12T16:02:42Z |
Hello

We are having some problems with the TWS for z/OS dynamic agent workstation in our TWS Master.
I'm posting the message here - maybe someone out there can help.

Problem is that the TWS for z/OS dynamic agent workstation goes offline (agent stopped) in the TWS Master after some hours.
In the message log on the mainframe we see the following messages:

02/06 21.30.28 EELHT15E THE HTTP CLIENT FAILED TO PROCESS A REQUEST FOR BROKER
02/06 21.30.28 EELHT43I HTTP RESPONSE MESSAGE WITH CODE RDBMS_COLUMN_BOUNDS_VIOLATED
02/06 21.31.28 EELHT43I FAILED TO SWITCH BROKER TO HTTPS atp-01ttws01.prod.atp.local:31116
02/06 21.31.28 EELHT43I FAILED TO SWITCH BROKER TO HTTPS ATP-01TTWS02.prod.atp.locla:31116
02/06 21.31.28 EELHT43I FAILED TO SWITCH BROKER TO HTTPS atp-01ttws02.prod.atp.local:31116
02/06 21.35.24 EELW076W NO SERVER PULSE. EVENT WRITER WILL STOP PULSE DETECTION

What does the message "CODE RDBMS_COLUMN_BOUNDS_VIOLATED" mean?

After restart of the TWS for z/OS dynamic agent on the mainframe it works fine for 8-10 hours again?

Any ideas before we create a PMR?

Best regards
Finn Bastrup
Updated on 2013-02-14T14:05:41Z at 2013-02-14T14:05:41Z by SystemAdmin
  • umberto.caselli
    umberto.caselli
    9 Posts

    Re: TWS for z/OS dynamic agent goes offline in TWS Master

    ‏2013-02-12T16:24:09Z  
    Hello Finn,
    can you check the SystemOut.log file on the master? It should have a clearer error message.
    Thanks, Umberto
  • mymaestro
    mymaestro
    215 Posts

    Re: TWS for z/OS dynamic agent goes offline in TWS Master

    ‏2013-02-12T16:50:23Z  
    Looks like a typo in the configurations...
    HTTPS ATP-01TTWS02.prod.atp.locla:31116
  • SystemAdmin
    SystemAdmin
    194 Posts

    Re: TWS for z/OS dynamic agent goes offline in TWS Master

    ‏2013-02-14T11:22:58Z  
    Hello Finn,
    can you check the SystemOut.log file on the master? It should have a clearer error message.
    Thanks, Umberto
    Hello Umberto

    Here's the output from the eWAS SystemOut.log file on the TWS Master:
    13-02-13 22:33:44:312 CET] 0000001b resourceadvis E AWKRAE100E The resource "http://PRODLPAR:31114/ita/JobManager" missed "2" heartbeat counts. Setting the resource as inactive.

    Message on the z/OS agent on mainframe looks like:
    22.30.32 EELHT15E THE HTTP CLIENT FAILED TO PROCESS A REQUEST FOR BROKER
    22.30.32 EELHT43I HTTP RESPONSE MESSAGE WITH CODE RDBMS_COLUMN_BOUNDS_VIOLATED
    22.31.32 EELHT43I FAILED TO SWITCH BROKER TO HTTPS atp-01ttws01.prod.atp.local:31116
    22.31.32 EELHT43I FAILED TO SWITCH BROKER TO HTTPS ATP-01TTWS02.prod.atp.local:31116
    22.31.32 EELHT43I FAILED TO SWITCH BROKER TO HTTPS atp-01ttws02.prod.atp.local:31116
    22.35.00 EELW076W NO SERVER PULSE. EVENT WRITER WILL STOP PULSE DETECTION

    HTTPOPTS paramter in z/OS agent:
    HTTPOPTS TDWBHOSTNAME('atp-01ttws01.prod.atp.local')
    TDWBPORT(31115)
    TDWBSSL(NO)
    SSL(NO)
    CONNTIMEOUT(300)
    TCPIPTIMEOUT(900)

    In broker on TWS Master the RaaHeartBeatInterval=200 seconds (default) and MissedHeartBeatCount=2 (default).

    So why do we get this "time-out"?
    Is it due to a problem in the TCP/IP layer - if it is - do you have any good suggestions for IP-trace settings?

    Note that we have many distributed agents in this TWS (TDWB) master.

    Best regards
    Finn Bastrup
  • SystemAdmin
    SystemAdmin
    194 Posts

    Re: TWS for z/OS dynamic agent goes offline in TWS Master

    ‏2013-02-14T11:24:30Z  
    • mymaestro
    • ‏2013-02-12T16:50:23Z
    Looks like a typo in the configurations...
    HTTPS ATP-01TTWS02.prod.atp.locla:31116
    Very good spotted Warren.
    It as a copy/paste mistake.
  • mymaestro
    mymaestro
    215 Posts

    Re: TWS for z/OS dynamic agent goes offline in TWS Master

    ‏2013-02-14T13:55:44Z  
    Hello Umberto

    Here's the output from the eWAS SystemOut.log file on the TWS Master:
    13-02-13 22:33:44:312 CET] 0000001b resourceadvis E AWKRAE100E The resource "http://PRODLPAR:31114/ita/JobManager" missed "2" heartbeat counts. Setting the resource as inactive.

    Message on the z/OS agent on mainframe looks like:
    22.30.32 EELHT15E THE HTTP CLIENT FAILED TO PROCESS A REQUEST FOR BROKER
    22.30.32 EELHT43I HTTP RESPONSE MESSAGE WITH CODE RDBMS_COLUMN_BOUNDS_VIOLATED
    22.31.32 EELHT43I FAILED TO SWITCH BROKER TO HTTPS atp-01ttws01.prod.atp.local:31116
    22.31.32 EELHT43I FAILED TO SWITCH BROKER TO HTTPS ATP-01TTWS02.prod.atp.local:31116
    22.31.32 EELHT43I FAILED TO SWITCH BROKER TO HTTPS atp-01ttws02.prod.atp.local:31116
    22.35.00 EELW076W NO SERVER PULSE. EVENT WRITER WILL STOP PULSE DETECTION

    HTTPOPTS paramter in z/OS agent:
    HTTPOPTS TDWBHOSTNAME('atp-01ttws01.prod.atp.local')
    TDWBPORT(31115)
    TDWBSSL(NO)
    SSL(NO)
    CONNTIMEOUT(300)
    TCPIPTIMEOUT(900)

    In broker on TWS Master the RaaHeartBeatInterval=200 seconds (default) and MissedHeartBeatCount=2 (default).

    So why do we get this "time-out"?
    Is it due to a problem in the TCP/IP layer - if it is - do you have any good suggestions for IP-trace settings?

    Note that we have many distributed agents in this TWS (TDWB) master.

    Best regards
    Finn Bastrup
    There may be DNS problems. The master is trying to contact "PRODLPAR" on port 31114; make sure this DNS name is consistent and unambiguous.
    Meanwhile the agent on z/OS tries to reach "atp-01ttws01.prod.atp.local" to send the heartbeat information.
    Both of those DNS names seem strange to me (ie., not like "real" names).
    Is there a backup master? If so, its name must be consistent as well.
  • SystemAdmin
    SystemAdmin
    194 Posts

    Re: TWS for z/OS dynamic agent goes offline in TWS Master

    ‏2013-02-14T14:05:41Z  
    • mymaestro
    • ‏2013-02-14T13:55:44Z
    There may be DNS problems. The master is trying to contact "PRODLPAR" on port 31114; make sure this DNS name is consistent and unambiguous.
    Meanwhile the agent on z/OS tries to reach "atp-01ttws01.prod.atp.local" to send the heartbeat information.
    Both of those DNS names seem strange to me (ie., not like "real" names).
    Is there a backup master? If so, its name must be consistent as well.
    Thank you very much for the feedback Warren.
    I know that the DNS names used at customer site seems very strange, but they works and nslookup etc. shows that they resolves as expected.
    There is a backup Master on "atp-01ttws02.prod.atp.local" and it is this address that the agent tries to contact when connection to the master on "atp-01ttws01.prod.atp.local" is lost.
    I think it works as it should, because the connection is established when the agent is restarted on mainframe and agent keeps connected until in the evening.
    Best regards
    Finn