IBM Support

Slowness in Loader due to master cache process

Troubleshooting


Problem

Slowness in Loader due to master cache process.

Symptom

Loader processing is slow at NC-cache or master cache request-response and taking long time to load LIF files.

  • MasterCache Response OK:

Loader

20:59:32,461 INFO [LocalCache] TaskID[454], Master cache instance is initialized for eri_eran_20q1_02 calling processRequestMaster

20:59:38,925 INFO [MasterCacheResponse] TaskID[454], Master Response. Return Code [OK]

--

NC_CACHE

20:59:38,924 INFO [NcMasterCache] Master Cache processing file [processRequest started loader [eri_eran_20q1_02] file [/data/ldr/eri/eran/loader/input/INUSE.Sample_data1.lif]]

20:59:38,924 INFO [NcMasterCache] Master Cache processing summary: file [/data/ldr/eri/eran/loader/input/INUSE.Sample_data1.lif] started [Wed Nov 04 20:59:38 CET 2020], finished [Wed Nov 04 20:59:38 CET 2020], duration [0], response [OK]

  • Master Cache Response [RELOAD]:

Loader

20:59:25,250 INFO [LocalCache] TaskID[454], Master cache instance is initialized for eri_eran_20q1_02 calling processRequestMaster

20:59:32,460 INFO [MasterCacheResponse] TaskID[454], Master Response. Return Code [RELOAD]

--

NC_CACHE

20:59:32,440 INFO [NcMasterCache] Master Cache processing file [processRequest started loader [eri_eran_20q1_02] file [/data/ldr/eri/eran/loader/input/INUSE. Sample_data2.lif]]

20:59:32,459 INFO [NcMasterCache] Master Cache processing summary: file [/data/ldr/eri/eran/loader/input/INUSE. Sample_data2.lif] started [Wed Nov 04 20:59:32 CET 2020], finished [Wed Nov 04 20:59:32 CET 2020], duration [19], response [RELOAD]

 In both the cases nc_cache has finished processing in few milliseconds but the request itself had reached with delay at master cache [nc_cache].

Cause

Here are 4 examples of loader slowness from nc_cache:

Example #1:

hua_eran_r016_1_22.log (loader instance log)

19:33:10,811 DEBUG [LocalCache] TaskID[2], LocalCache detected possible re-homing scenario :RehomingScenario: NEW
19:33:10,813 DEBUG [LocalCache] TaskID[2], request build for  thread [31] file [/data/ldr/hua/eran/loader22/input/INUSE.BUCKET-#-NR-#-B1911816278-#-20201105192846-#-29709-#-1-#-60-#-PIF_2_OUTPUT-29709.lif]
19:33:10,813 INFO  [MasterCacheRequest] TaskID[2], Master Request. No Blocks [243 Diffs 221].
19:33:10,814 INFO  [MasterCacheRequest] TaskID[2],      Table [NC_NUTRANCELL] version[118410]. Number of lookups [221]. Number of diff [221]. Times [,Thu Nov 05 18:30:00 CET 2020]
19:33:10,814 INFO  [LocalCache] TaskID[2], Master cache instance is initialized for hua_eran_r016_1_22 calling processRequestMaster
19:33:12,295 INFO  [MasterCacheResponse] TaskID[2], Master Response. Return Code [RELOAD]
19:33:12,296 INFO  [MasterCacheResponse] TaskID[2],     Table[NC_NUTRANCELL] version[null]. Number of new/updated rows[221].
19:33:12,298 DEBUG [LocalCache] TaskID[2], applyResponse  thread [31] file [/data/ldr/hua/eran/loader22/input/INUSE.BUCKET-#-NR-#-B1911816278-#-20201105192846-#-29709-#-1-#-60-#-PIF_2_OUTPUT-29709.lif]
 

 

Nc-cache log:

19:33:12,220 INFO  [NcMasterCache] Master Cache processing file [processRequest started  loader [hua_eran_r016_1_22] file [/data/ldr/hua/eran/loader22/input/INUSE.BUCKET-#-NR-#-B1911816278-#-20201105192846-#-29709-#-1-#-60-#-PIF_2_OUTPUT-29709.lif]]
19:33:12,220 INFO  [MasterCacheRequest] Master Request. No Blocks [243 Diffs 221].
19:33:12,220 INFO  [MasterCacheRequest]         Table [NC_NUTRANCELL] version[118410]. Number of lookups [221]. Number of diff [221]. Times [,Thu Nov 05 18:30:00 CET 2020]
19:33:12,220 INFO  [NcMasterCache] Master Cache processing file [hua_eran_r016_1_22check versions for  table [NC_NUTRANCELL] requested version [118410] cached version [118410]]
19:33:12,220 INFO  [NcMasterCache] Master Cache processing file [hua_eran_r016_1_22loading lookup rows for NC_NUTRANCELL]
19:33:12,220 DEBUG [TableDAO] started NcDAO.select()_3
19:33:12,220 DEBUG [TableDAO]  NcDAO.select()_3 ..query:select NC_ID, NUTRANCELL_ID, NETWORK_ID, REGION_ID, GNODEBFUNCTION_ID, TIMESTAMP, VENDOR_ID, TECH_ID, NC_VERSION, NUTRANCELL_NAME, TECHNOLOGY, VERSION, ENDSTAMP from NC_NUTRANCELL where NUTRANCELL_ID in ( '84022404', '84015674', '84022405', '84015673', '84022402', '84022403', '84015675', '84022400', '84022401', '84032693', '84032694', '84032695', '84032568', '84048483', .., '84013974', '84015239', '84022685', '84041065', '84041064' ) and TIMESTAMP <= ? and ENDSTAMP >= ?
19:33:12,220 DEBUG [TableDAO]  NcDAO.select()_3 ..after ps = connection.prepareStatement(query)
19:33:12,220 DEBUG [TableDAO] NcDAO.select()_3 ps:oracle.jdbc.driver.OraclePreparedStatementWrapper@822f238b
19:33:12,220 DEBUG [TableDAO] NcDAO.select()_3 ps:oracle.jdbc.driver.OraclePreparedStatementWrapper@822f238b
19:33:12,262 DEBUG [NcMasterCache] loadRows: 221 rows found   for table [NC_NUTRANCELL] and ids [ '84022404', '84015674', '84022405', '84015673', … '84013974', '84015239', '84022685', '84041065', '84041064']
19:33:12,262 INFO  [NcMasterCache] NC and Local Cache in sync for table[NC_NUTRANCELL] Version [118410], Searched DB for [221] rows. Found [221]
19:33:12,262 DEBUG [NcMasterCache] processRequest finished  loader [hua_eran_r016_1_22] file [/data/ldr/hua/eran/loader22/input/INUSE.BUCKET-#-NR-#-B1911816278-#-20201105192846-#-29709-#-1-#-60-#-PIF_2_OUTPUT-29709.lif] with result RELOAD
19:33:12,262 INFO  [MasterCacheResponse] Master Response. Return Code [RELOAD]
19:33:12,262 INFO  [MasterCacheResponse]        Table[NC_NUTRANCELL] version[null]. Number of new/updated rows[221].
19:33:12,262 INFO  [NcMasterCache] Master Cache processing summary: file [/data/ldr/hua/eran/loader22/input/INUSE.BUCKET-#-NR-#-B1911816278-#-20201105192846-#-29709-#-1-#-60-#-PIF_2_OUTPUT-29709.lif] started [Thu Nov 05 19:33:12 CET 2020], finished [Thu Nov 05 19:33:12 CET 2020], duration [42], response [RELOAD]

In the above log, we can see there is a 2 sec delay in loader but nc_cache seems to process request in 42 ms. Here it appears to be delay in handshake between loader instance and master-cache process, which can either be due to some network or connectivity issue or if nc_cache has more requests to process in the queue.

Example #2:

Nc_cache.log:

15:14:24,364 INFO  [NcMasterCache] Master Cache processing file [processRequest started  loader [mavenirvolte_10] file [/data/ldr/mav/volte/loader01/input/INUSE.202011051220-#-201_AS_CSCF_SESSIONUNRELATED-#-VolteBLF-#-M201_AS_CSCF_SESSIONUNRELATED-#-I-#-P2-#-P-22108.lif]]
15:14:24,364 INFO  [MasterCacheRequest] Master Request. No Blocks [1265 Diffs 1].
15:14:24,365 INFO  [MasterCacheRequest]         Table [NC_MAV_CARD] version[129]. Number of lookups [0]. Number of diff [1]. Times []
15:14:24,365 INFO  [MasterCacheRequest]         Table [NC_MAV_NODE] version[17]. Number of lookups [0]. Number of diff [1]. Times []
15:14:24,365 INFO  [MasterCacheRequest]         Table [NC_NETWORK] version[66903]. Number of lookups [0]. Number of diff [1]. Times []
15:14:24,365 INFO  [MasterCacheRequest]         Table [NC_MAV_PEER] version[19732]. Number of lookups [1]. Number of diff [1]. Times [,Thu Nov 05 12:15:00 CET 2020]
15:14:24,365 INFO  [MasterCacheRequest]         Table [NC_REGION] version[61987]. Number of lookups [0]. Number of diff [1]. Times []
15:14:24,365 INFO  [NcMasterCache] NC and Local Cache not in sync for table[NC_NETWORK]. Requested Version [66903]. Cached Version [66849]. Loaded [0] rows from the DB.
15:14:24,378 INFO  [NcMasterCache] NC and Local Cache in sync for table[NC_SCTP_ENDPOINT] Version [24869], Searched DB for [107] rows. Found [107]
15:14:25,919 INFO  [NcMasterCache] NC and Local Cache in sync for table[NC_MAV_PEER] Version [19732], Searched DB for [1] rows. Found [0]
15:14:25,919 INFO  [NcMasterCache] NC and Local Cache not in sync for table[NC_REGION]. Requested Version [61987]. Cached Version [61937]. Loaded [0] rows from the DB.
15:14:25,919 INFO  [NcMasterCache] Versions match applying [1] rows to the DB.
15:14:26,172 INFO  [NcMasterCache] NC and Local Cache in sync for table[NC_LTE_SCTP_ASSOCIATION] Version [759782], Searched DB for [1891] rows. Found [1891]

15:14:26,172 INFO  [MasterCacheResponse] Master Response. Return Code [RELOAD]
15:14:26,172 INFO  [MasterCacheResponse]        Table[NC_SCTP_ENDPOINT] version[null]. Number of new/updated rows[107].
15:14:26,172 INFO  [MasterCacheResponse]        Table[NC_LTE_SCTP_ASSOCIATION] version[null]. Number of new/updated rows[1891].
15:14:26,172 INFO  [NcMasterCache] Master Cache processing summary: file [/data/ldr/eri/node/loader13/input/INUSE.BUCKET-#-NODE-#-SctpAssociation-#-20201105150126-#-29660-#-1-#-36-#-PIF_2_OUTPUT-29660.lif] started [Thu Nov 05 15:14:24 CET 2020], finished [Thu Nov 05 15:14:26 CET 2020], duration [1813], response [RELOAD]
15:14:26,172 INFO  [NcMasterCache] Master Cache processing file [processRequest started  loader [nok_nss_m170_01] file [/data/ldr/nok/nss/loader01/input/INUSE.BUCKET-#-SMSTY-#-20201105124041-#-13288-#-1-#-93-#-PIF_2_OUTPUT-13288.lif]]
15:14:26,172 INFO  [MasterCacheRequest] Master Request. No Blocks [268 Diffs 14].
15:14:26,172 INFO  [MasterCacheRequest]         Table [NC_SMSC] version[578660]. Number of lookups [17]. Number of diff [14]. Times [,Thu Nov 05 12:00:00 CET 2020,Thu Nov 05 12:15:00 CET 2020]
15:14:27,451 INFO  [NcMasterCache] NC and Local Cache in sync for table[NC_SMSC] Version [578660], Searched DB for [17] rows. Found [16]
15:14:27,451 INFO  [MasterCacheResponse] Master Response. Return Code [RELOAD]
15:14:27,451 INFO  [MasterCacheResponse]        Table[NC_SMSC] version[null]. Number of new/updated rows[16].
15:14:27,451 INFO  [NcMasterCache] Master Cache processing summary: file [/data/ldr/nok/nss/loader01/input/INUSE.BUCKET-#-SMSTY-#-20201105124041-#-13288-#-1-#-93-#-PIF_2_OUTPUT-13288.lif] started [Thu Nov 05 15:14:26 CET 2020], finished [Thu Nov 05 15:14:27 CET 2020], duration [1278], response [RELOAD]
15:14:27,451 INFO  [NcMasterCache] Master Cache processing file [processRequest started  loader [eri_node_18q4_04] file [/data/ldr/eri/node/loader04/input/INUSE.BUCKET-#-NODE-#-RfPort-#-20201105150108-#-20397-#-1-#-10-#-PIF_2_OUTPUT-20397.lif]]
15:14:27,451 INFO  [MasterCacheRequest] Master Request. No Blocks [229 Diffs 229].
15:14:27,451 INFO  [MasterCacheRequest]         Table [NC_RF_PORT] version[6444]. Number of lookups [229]. Number of diff [229]. Times [,Thu Nov 05 14:00:00 CET 2020]

15:14:27,460 INFO  [TableDAOSecondary] Inserted [1] rows into NC table [NC_MAV_PEER]
15:14:27,461 INFO  [MasterCacheResponse] Master Response. Return Code [OK]
15:14:27,461 INFO  [MasterCacheResponse]        Table[NC_MAV_PEER] version[19733]. Number of new/updated rows[1].
15:14:27,461 INFO  [NcMasterCache] Master Cache processing summary: file [/data/ldr/mav/volte/loader01/input/INUSE.202011051220-#-201_AS_CSCF_SESSIONUNRELATED-#-VolteBLF-#-M201_AS_CSCF_SESSIONUNRELATED-#-I-#-P2-#-P-22108.lif] started [Thu Nov 05 15:14:24 CET 2020], finished [Thu Nov 05 15:14:27 CET 2020], duration [3096], response [OK]

In the above example, nc_cache has taken 3 secs to process this LIF and then sending "OK" response. Out of this, 1.5 sec is for searching one ID in DB table and 1 sec is to insert into DB table.

Statements in-between (in Italic font) are of another LIF being processed by different instance of master-cache.

Example #3:

Nc_cache.log:

17:19:41,624 INFO  [NcMasterCache] Master Cache processing file [processRequest started  loader [eri_eran_20q1_01] file [/data/ldr/eri/eran/loader01/input/INUSE.BUCKET-#-LTE-#-UtranCellRelation-#-20201105165150-#-4756-#-1-#-99-#-JOIN-4756.lif]]
17:19:41,624 INFO  [MasterCacheRequest] Master Request. No Blocks [10000 Diffs 10000].
17:19:41,624 INFO  [MasterCacheRequest]         Table [NC_UTRANRELATION] version[301123]. Number of lookups [10000]. Number of diff [10000]. Times [,Thu Nov 05 16:00:00 CET 2020]
17:19:43,680 INFO  [NcMasterCache] NC and Local Cache in sync for table[NC_UTRANRELATION] Version [301123], Searched DB for [10000] rows. Found [10000]
17:19:43,680 INFO  [MasterCacheResponse] Master Response. Return Code [RELOAD]
17:19:43,680 INFO  [MasterCacheResponse]        Table[NC_UTRANRELATION] version[null]. Number of new/updated rows[10000].
17:19:43,680 INFO  [NcMasterCache] Master Cache processing summary: file [/data/ldr/eri/eran/loader01/input/INUSE.BUCKET-#-LTE-#-UtranCellRelation-#-20201105165150-#-4756-#-1-#-99-#-JOIN-4756.lif] started [Thu Nov 05 17:19:41 CET 2020], finished [Thu Nov 05 17:19:43 CET 2020], duration [2056], response [RELOAD]

In the above example as well, 2 secs are spent in searching the DB with 10000 records.

In such cases, where DB transaction is taking few seconds, results in, nc_cache lags with incoming request and results in 2 - 7 secs delay observed in Example #1.

Such delays are intermittent and occur for different no. of rows and different tables. Most of the time nc_cache is able to process 1 LIF file request in less than 100 ms but if few intermittent files take 2 - 3 secs, it indicates it is relatively slow and can result in pending requests.

 

Example #4:

 Loader log:

22:25:40,471 INFO  [MasterCacheRequest] TaskID[276], Master Request. No Blocks [1 Diffs 1].
22:25:40,472 INFO  [MasterCacheRequest] TaskID[276], Table [NC_CELL] version[9132]. Number of lookups [1]. Number of diff [1]. Times [,Sat Oct 31 18:15:00 IST 2020]
22:25:40,472 INFO  [LocalCache] TaskID[276], Master cache instance is initialized for nokiagsmbss_rg20 calling processRequestMaster
22:25:40,491 INFO  [MasterCacheResponse] TaskID[276], Master Response. Return Code [RELOAD]
22:25:40,491 INFO  [MasterCacheResponse] TaskID[276], Table[NC_CELL] version[null]. Number of new/updated rows[1].
22:25:40,492 INFO  [MasterCacheRequest] TaskID[276], Master Request. No Blocks [1 Diffs 0].
22:25:40,492 INFO  [MasterCacheRequest] TaskID[276], Table [NC_CELL] version[9132]. Number of lookups [0]. Number of diff [0]. Times []
22:25:40,492 INFO  [LocalCache] TaskID[276], Master cache instance is initialized for nokiagsmbss_rg20 calling processRequestMaster
22:25:40,498 INFO  [MasterCacheResponse] TaskID[276], Master Response. Return Code [OK]

Nc-cache log:

22:25:40,377 INFO  [NcMasterCache] Master Cache processing summary: file [/appl/virtuo/var/loader/spool/nokiagsmbss_rg20/RG20/INUSE.test2.lif] started [Mon Nov 09 22:25:39 IST 2020], finished [Mon Nov 09 22:25:40 IST 2020], duration [756], response [FAIL]
22:25:40,475 INFO  [NcMasterCache] Master Cache processing file [processRequest started  loader [nokiagsmbss_rg20] file [/appl/virtuo/var/loader/spool/nokiagsmbss_rg20/RG20/INUSE.test3.lif]]
22:25:40,475 INFO  [MasterCacheRequest] Master Request. No Blocks [1 Diffs 1].
22:25:40,475 INFO  [MasterCacheRequest] Table [NC_CELL] version[9132]. Number of lookups [1]. Number of diff [1]. Times [,Sat Oct 31 18:15:00 IST 2020]
22:25:40,475 INFO  [NcMasterCache] Master Cache processing file [nokiagsmbss_rg20check versions for  table [NC_CELL] requested version [9132] cached version [9132]]
22:25:40,476 INFO  [NcMasterCache] Master Cache processing file [nokiagsmbss_rg20loading lookup rows for NC_CELL]
22:25:40,476 DEBUG [TableDAO] started NcDAO.select()_3
22:25:40,476 DEBUG [TableDAO]  NcDAO.select()_3 ..query:select NC_ID, CELL_ID, BS_ID, BSC_ID, PCU_ID, NSVC_ID, LAC_ID, ROUTING_AREA_ID, MSC_ID, SGSN_ID, REGION_ID, NETWORK_ID, REGISTRATION_AREA_ID, TIMESTAMP, VENDOR_ID, TECH_ID, NC_VERSION, CELL_NAME, CELL_TYPE, CELL_DESCRIPTION, MAX_POWER, DEFINED_TRX, DEFINED_TCH, DEFINED_CCH, SEGMENT_ID, CELL_VERSION, TECHNOLOGY, ENDSTAMP from NC_CELL where CELL_ID in ( '1-2-1-1' ) and TIMESTAMP <= ? and ENDSTAMP >= ?
22:25:40,476 DEBUG [TableDAO]  NcDAO.select()_3 ..after ps = connection.prepareStatement(query)
22:25:40,477 DEBUG [TableDAO] NcDAO.select()_3 ps:oracle.jdbc.driver.OraclePreparedStatementWrapper@10e918d3
22:25:40,487 DEBUG [NcMasterCache] loadRows: 1 rows found   for table [NC_CELL] and ids [ '1-2-1-1']
22:25:40,488 INFO  [NcMasterCache] NC and Local Cache in sync for table[NC_CELL] Version [9132], Searched DB for [1] rows. Found [1]
22:25:40,488 DEBUG [NcMasterCache] processRequest finished  loader [nokiagsmbss_rg20] file [/appl/virtuo/var/loader/spool/nokiagsmbss_rg20/RG20/INUSE.test3.lif] with result RELOAD
22:25:40,488 INFO  [MasterCacheResponse] Master Response. Return Code [RELOAD]
22:25:40,488 INFO  [MasterCacheResponse] Table[NC_CELL] version[null]. Number of new/updated rows[1].
22:25:40,489 INFO  [NcMasterCache] Master Cache processing summary: file [/appl/virtuo/var/loader/spool/nokiagsmbss_rg20/RG20/INUSE.test3.lif] started [Mon Nov 09 22:25:40 IST 2020], finished [Mon Nov 09 22:25:40 IST 2020], duration [13], response [RELOAD]
22:25:40,495 INFO  [NcMasterCache] Master Cache processing file [processRequest started  loader [nokiagsmbss_rg20] file [/appl/virtuo/var/loader/spool/nokiagsmbss_rg20/RG20/INUSE.test3.lif]]
22:25:40,495 INFO  [MasterCacheRequest] Master Request. No Blocks [1 Diffs 0].
22:25:40,496 INFO  [MasterCacheRequest] Table [NC_CELL] version[9132]. Number of lookups [0]. Number of diff [0]. Times []
22:25:40,496 INFO  [NcMasterCache] Master Cache processing file [nokiagsmbss_rg20check versions for  table [NC_CELL] requested version [9132] cached version [9132]]
22:25:40,496 DEBUG [NcMasterCache] processRequest finished  loader [nokiagsmbss_rg20] file [/appl/virtuo/var/loader/spool/nokiagsmbss_rg20/RG20/INUSE.test3.lif] with result OK, no DIFF
22:25:40,496 INFO  [MasterCacheResponse] Master Response. Return Code [OK]
22:25:40,497 INFO  [NcMasterCache] Master Cache processing summary: file [/appl/virtuo/var/loader/spool/nokiagsmbss_rg20/RG20/INUSE.test3.lif] started [Mon Nov 09 22:25:40 IST 2020], finished [Mon Nov 09 22:25:40 IST 2020], duration [0], response [OK]
In above example, loader is processing 1 LIF block and loader's local cache detected 1 difference that it considers new data. Loader sends request to NC-cache with 1 block difference and NC master cache then checks the DB fir this ID to find it in the DB.

Hence, it does not update any NC table but it updates local cache and RELOAD response is sent to loader.

Loader updates its local cache and again sends a request to master cache with 0 Diff.

This time master cache does not check anything in DB and sends back response "OK" to loader.

In first request, nc_cache takes13ms for RELOAD response and 0 ms for 2nd request with OK response (no DB transaction).

There are 3 scenarios of master cache request.

Scenario #1:
17:19:41,624 INFO [MasterCacheRequest] Master Request. No Blocks [10000 Diffs 0].
No DB transaction and no work for NC-master cache.
 

Scenario #2:
17:19:41,624 INFO [MasterCacheRequest] Master Request. No Blocks [10000 Diffs 10000].
Searched DB for [10000] rows. Found [10000]

In this case, master cache checks in DB table and found all diff rows so there is no DML transaction.
It has 1 DB transaction and updates local cache with reload with below log statement:
Table[NC_UTRANRELATION] version[null]. Number of new/updated rows[10000].
 
Scenario #3:
17:19:41,624 INFO [MasterCacheRequest] Master Request. No Blocks [10000 Diffs 10000].
Searched DB for [10000] rows. Found [0]

In this case, master cache checks in DB table and does not find any rows, it has to insert or update in table.
There are 2 DB transactions and local cache is also updated with below statement:
 Inserted [1000] rows into NC table [NC_UTRANRELATION]
Below statement in nc-cache stat:
Table[NC_UTRANRELATION] version[19733]. Number of new/updated rows[10000].
 

Scenario #1 is ideal and fastest request to be processed by master cache as it does not involve any DB transaction. All IDs and data are cache by respective instance.

Scenario #2 is unwanted as there is no new data or updated data, loader's local cache assumes this to be new as it has not loaded this data before. This can occur either data(LIF) loaded into the instance has different ID's and is not distributed correctly. It can also occur if loader is frequently restarted as local cache is empty and is built over the time.

Scenario #3 is valid when new data (or update to the data) arrives.

If loader is identifying any difference in data by checking DB or inserts or updates data in NC tables, it results in DB transaction. Increase in no. of transaction will result in local cache-master cache sync up.

When loader instance is restarted, every ID is new for local cache of that loader instance and master cache will check for ID in DB.

Environment

Product: Tivoli Netcool Performance Manager for Wireless

Release: 1.3.2, 1.4.x

Platform: Solaris; Linux; AIX

Diagnosing The Problem

From above examples, below are the causes:

  • Network delay or glitch between the servers (Loader and MasterCache are on different servers)
When there is actual delay in servers communications between loader and application servers in terms of TCP/IP packet transfer, it needs to be fixed with the help of network team.
  • MasterCache itself is slow and has performance issues.
When master cache is slow, it would impact all the loader instances on the server for the all the technologies causing overall slowness in data processing for all TPs. This can happen if NC table volume is very high and select query for searching IDs is slow.
  • MatserCache has too many requests because frequently changes of IDs results in large no. of DML operations, hence the request from loader takes time.

For example:

Consider below data loading scenarios:

Following are LIFs loaded by two loader instances for Interval #1 and resulted into insert operation and update operation:

BS_ID

Block

Latitude

LIF

Loader Instance

1234

A1

1

LIF1

Loader1

1234

A1

2

LIF2

Loader2

 

Following LIFs are loaded at interval #2 and will result in 2 rows updates in NC_BS:

 

BS_ID

Block

Latitude

LIF

Loader Instance

1234

A1

3

LIF3

Loader1

1234

B1

4

LIF4

Loader2

 

There are 4 updates happened in two intervals for same time as Latitude value changes for single 1234 BS_ID. This results in high no. of nc updates causing the slowness in data loading at master cache.

Generally No. of nc updates are caused by one or more following conditions:

  • Nature of data
  • Wrong configuration at gateway or loader loadmap.
  • Problem in data, LIF distribution resulting in caching issues during data loading.

While data loading handshake between local cache of the loader instance and master cache (nc_cache) happen on application server and LIF gets loaded into database, master cache object is created on application server when nc_cache is started and is stored in RMI registry.

When loader starts (on loader server) then it builds local cache, which further builds serialized MasterCacheRequest object, and it forwarded to application server via RRL(Remote Reference Layer) and TCP/IP. Application server receives and converts skeleton object where it deserializes MasterCacheRequest and call master cache object from RMI registry on application server. Then, master cache processes MasterCacheRequest and send MasterCacheResponse back to local cache same as how MasterCacheRequest object is sent to master cache.

Resolving The Problem

Short-term solution:

  • Adding new loader instances to process the increased load on nc_cache due to high no. of nc updates and redistribute the LIF files among loaders.

Long-term solution:

  1. Reducing frequent nc updates happening in nc tables.
      A. Disabling or filtering or rectifying the property values of  IDs from the gateway side.

This is to identify and disable or filter or modify the IDs at gateway itself before processing, which will stop loading of unwanted network data changes and will improve the loader performance. Gateway should be configured such that joining of block load into same table and avoid same Cell coming to multiple LIF. If 3 - 4 blocks have same entity ID and it results in frequent NC updates, then consider joining those blocks into single block at gateway.

      B. Disabling the loadmap for column causing the frequent and unwanted nc updates in nc table.

  • Analyze the nc_cache_stats.txt files for merge operation located at /appl/virtuo/logs/loader on application server in respective nc_tables and take the table-wise count for nc tables each for example nc_bs or nc cell and identify the nc table having high no. of nc updates happening in it. Accordingly navigate into /appl/virtuo/logs/loader and verify the columns causing these high no. of nc updates from MasterCache_Rehoming.log as mentioned in below point.
  • Identify the column and ID causing the frequent nc updates from MasterCache_rehoming log.

For example: Following snippet shows updates in nc tables due to frequent change in region:

/data/ldr/hua/eran/loader12/input/INUSE.BUCKET-#-LTE-#-B1526726938-#-20201014071320-#-28680-#-1-#-32-#-PERLIZE-28680.lif
11:35:56,455 DEBUG [rehoming] RehomingScenario: UPDATE
Table   : NC_EUTRANCELL
OldKey  : NULL
NewKey  : 64267967(29-Oct-32 00:01:00,19-Jan-38 23:59:00)[ncId=20547599,version=32671151]
Updates : REGION_ID(Hamburg->UNDEFINED)
11:35:56,455 DEBUG [rehoming] Prepared to apply changes  for loader
  • Verify the respective ID from the nc table and value being updated.
   Alter session set nls_date_format = ‘DD-MM-YYYY HH24:MI:SS'
  Select * from nc_bs
  where BS_ID like <BS_ID for which nc update happening>
  and endstamp like ‘19-01-2038 23:59:00’
     Example:
      Select * from nc_bs
   where BS_ID like <’ AA/11/1234’>
   and endstamp like ‘19-01-2038 23:59:00’
An entry becomes visible in output with latest value change for column for which update is happened.
  • Check the loadmap expression for column for which frequent update is happening with query:
select * from lc_loadmap
where ruleset_id = <ruleset_id>
and tabname like 'nc_bs'
and blockname like '<blockname>'
and colalias in (‘<column on which nc update happened>’);
Example:
          select * from lc_loadmap
    where ruleset_id = 12345
    and tabname like 'nc_bs'
    and blockname like 'B12345678'
    and colalias in (‘REGION’);
  • On verification of loadmap expression, proceed with disabling the column loadmap from lc_loadmap table by using following query.
update lc_loadmap set active=0
where colalias like ‘<column on which nc update happened>’
and tabname like 'nc_bs'
and blockname like '<blockname>'
and ruleset_id = <ruleset_id>
and active=1;
Commit;
Example:     
update lc_loadmap set active=0
where colalias like 'REGION'
and tabname like 'nc_bs'
and blockname like 'B12345678'
and ruleset_id = 12345
and active=1;
Commit;

This way, identify the column or columns having high no. of nc updates from MasterCache_rehoming logs and disable those[column or columns] from loadmap in order to stop nc updates happening frequently.

C. Discrepancy in raw data : If gateway , Loader configuration are not cause behind NC updates, it is possible that the values are coming from raw data itself and in that case, source of the data (EMS/OMC server )  needs to be traced. There is no ideal or optimal number of NC updates as it depends on the size of the network. But, as an example , for "bs" , no. of updates should be less than 1% of total BS present in the network.

2. Following the below approach in LIF distribution at loader instances help in reducing delays at master cache
A. Implement LIF distribution based on IDs and block name. For example: One set of cell_id should be loaded through single instance. If the environment has 100 IDs with 2 loader instances, then load 50 IDs from first instance and other 50 from 2nd instance.
B. First-level Distribution should be at entity ID (corresponding to the Cells for respective technology) level whereas second-level distribution should be block level.
3. Increasing the sufficient amount of memory against each of the loader instance for the impacted TP : Loader instances use its heap memory to cache all the inventory or network data (NC tables data). When loader instance is started, its cache is empty and it will be built as data is loaded. Any changes or updates to the properties cause DB transactions through master cache and if all instances are loading all Cells then it requires more memory to cache the data.
 4. Enable one or more parameters based on the issue:
      
  • Weak-HashMap by setting parameter nccache.aggressive.memory.clean to true:  When set to true for a particular loader, it periodically clean ups the cache content for the loader and free up memory. This might improve memory usage but increases no. of Diff requests sent to nc-cache as local cache is getting cleaned. The following statement inserts default value of false for this parameter for all loaders.
INSERT INTO lc_loader_config_properties
SELECT DISTINCT datasource_id, loader_name, 'nccache.aggressive.memory.clean', 'FALSE' FROM lc_loader_config_properties
WHERE loader_name NOT IN (SELECT DISTINCT loader_name FROM lc_loader_config_properties
and name = 'nccache.aggressive.memory.clean' OR  loader_name = 'SYSTEM');
commit;
If any loader has an OOM issue, you can run the following statement for that loader:
update lc_loader_config_properties set value='true' where loader_name = 'Loader1' and name='nccache.aggressive.memory.clean';
commit;
              

  • Refresh threshold by setting parameter nccache.memory.refresh.threshold: It free-ups some cache content when free memory of the loader goes under the threshold. For example: If threshold is set to 100 and free memory is less than 100, then loader empties some cache content. Loader empties only NEIGHBOUR data from cache. To set the threshold to 100MB for a particular loader, you can use the following statement.
   insert into lc_loader_config_properties
   values (<datasource_id>,'<loader_name>','nccache.memory.refresh.threshold',100);
   commit;
                              
  • Secondary cache enable by setting parameter mastercache.secondary.tables: To change the configuration of a particular loader to process specific nc tables that use secondary cache, run the following statement for the particular loader:
For Example: Loader1 is updating NC_NEIGHBOUR, which contains numerous rows and it's blocking other loaders for other NC tables
update lc_loader_config_properties set value='NC_NEIGHBOUR'
where loader_name = 'Loader1'
and name='mastercache.secondary.tables';
commit;                       
           

Document Location

Worldwide

[{"Type":"MASTER","Line of Business":{"code":"LOB45","label":"Automation"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSKGHX","label":"Tivoli Netcool Performance Manager for Wireless"},"ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Versions"}]

Product Synonym

TNPMW;Tivoli Netcool Performance Manager Wireless

Document Information

Modified date:
13 March 2022

UID

ibm16558532