IBM Support

OMEGAMON TEMS Startup task fails to initialize

Troubleshooting


Problem

OMEGAMON Tivoli Enterprise Monitoring Server (TEMS) Startup task (OMEGDSST) doesn't start completely

Symptom

The default RKLVLOG shows this in the trace:

2012.293 10:57:35.80 (000A-E1D9DD73:kdebpbi.c,46,"KDEBP_Bind") Status 1DE00042=KDE1_STC_INVALIDENDPOINT
2012.293 10:57:35.80 (000B-E1D9DD73:kdcsuse.c,120,"KDCS_UseFamily") status=1c010005, "cant bind socket", ncs/KDC1_STC_CANT_BIND_SOCK
2012.293 10:57:35.80 (000C-E1D9DD73:kdepnpc.c,138,"KDEP_NewPCB") 172.25.8.1: 1070079A, KDEP_pcb_t @ 1C3532B0 created
2012.293 10:57:35.81 (000D-E1D9DD73:kdepdpc.c,62,"KDEP_DeletePCB") 1070079A: KDEP_pcb_t deleted
2012.293 10:57:35.81 (000E-E1D9DD73:kdcc1sr.c,485,"rpc__sar") Connection failure: "ip.pipe:#172.25.8.1:1918", 1C010001:1DE00045, 0, 5(2), FFFF/1, D140831.1:1.1.1.13, tms_ctbs623fp1:d2039a
2012.293 10:57:35.81 (000F-E1D9DD73:kdcl0cl.c,142,"KDCL0_ClientLookup") status=1c020006, "location server unavailable", ncs/KDC1_STC_SERVER_UNAVAILABLE
2012.293 10:57:35.81 (0010-E1D9DD73:kdck1im.c,85,"socket__inq_my_netaddr") status=10020002, "socket buffer too small", ncs/KDC1_STC_BUFF_TOO_SMALL
2012.293 10:57:35.81 (0011-E1D9DD73:kdsncsrv.c,486,"Server") Check local failed. status = 5

Cause

The agent IRAMAN startup may start the agent before the TEMS CTDS STARTUP fully initializes.

Environment

z/OS

Resolving The Problem

The reason for the problem is that the server side of the TEMS has not completed initialization before the agent side of the TEMS starts

In certain cases, moving the AGENT IRAMAN startup to the RKANCMDU(KDSSTART) member from RKANCMDU(KDSSTRT1) may start the agent before the TEMS CTDS STARTUP fully initializes. The timeout issues may occasionally happen causing the TEMS to encounter the types of messages shown in the symptom

If these conditions occur, a delay of 60 seconds (default value) is not long enough for the server side to initialize so the following circumvention may be used:

In the RKANCMDU(KDSSTART) TEMS startup member, increase the delay interval in starting up the KOBAGENT by increasing the "DELAY=hh:mm:ss" parameter as shown in the example below:

AT ADD ID=KOB DELAY=00:01:00 CMD='IRAMAN KOBAGENT START'

For most of the LPARs adding extra 60 seconds might resolve the problem but for slower LPAR, the client can calculate the difference in timestamps between the KLVAT011 ID=KOB message and the Status 1DE00042=KDE1_STC_INVALIDENDPOINT bind failure, round up to the next minute, and adjust the AT ADD ID=KOB DELAY=00:01:00 statement in RKANCMDU(KDSSTART) accordingly.
example :

2012.293 10:51:45.72 KLVAT011 ID=KOB DELAY=00:01:00 EVERY=00:00:00 REPEAT=1 2012.293 10:51:45.72 KLVAT012 OPERATOR=*MASTER* LAST=NONE DONE=0 NEXT=10:52:45 2012.293 10:51:45.72 KLVAT013 CMD=IRAMAN KOBAGENT START
.....
.....

2012.293 10:52:45.72 KRAOP007 START COMMAND SCHEDULED FOR 'KOBAGENT'

2012.293 10:53:56.83 (01A2-E3B94C0B:kdebpap.c,128,"KDEBP_AssignPort")
ip.pipe bound to port 10110: base=1918, limit=1918

. . .

2012.293 10:57:35.80 (000A-E1D9DD73:kdebpbi.c,46,"KDEBP_Bind") Status
1DE00042=KDE1_STC_INVALIDENDPOINT

2012.293 10:57:35.80 (000B-E1D9DD73:kdcsuse.c,120,"KDCS_UseFamily")
status=1c010005, "cant bind socket"

2012.293 10:57:35.81 KDSNC009 Unable to create location brokers, status = 1C010005

As we can see, even with the maintenance, a delay of 60 seconds is not long enough for the server side to initialize. It is taking just over 5:10 to complete.

So on this LPAR the delay specified in RKANCMDU(KDSSTART) would be 6 minutes:

AT ADD ID=KOB DELAY=00:06:00 CMD='IRAMAN KOBAGENT START'

If the default member is KDSSTRT1, not KDSSTART for your agent, then change the delay time on KDSSTRT1 member

[{"Product":{"code":"SSTFXA","label":"Tivoli Monitoring"},"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Component":"ITM Tivoli Enterprise Mgmt Server V6","Platform":[{"code":"PF035","label":"z\/OS"}],"Version":"All Versions","Edition":"","Line of Business":{"code":"LOB45","label":"Automation"}}]

Document Information

Modified date:
17 June 2018

UID

swg21615321