Troubleshooting
Problem
Loader throwing OutOfMemory and or Loader slowness in master cache process due to low memory.
Symptom
- No. of IDs to be cached by instance is too high, which can happen if LIF distribution is not in order.
- Data to be cached by loader depends on properties of entity and nature of data, which is different for every environment. More the no. of properties and incorrect LIF distribution, more memory is required.
- MasterCache Response OK:
Loader
20:59:32,461 INFO [LocalCache] TaskID[454], Master cache instance is initialized for eri_eran_20q1_02 calling processRequestMaster
20:59:38,925 INFO [MasterCacheResponse] TaskID[454], Master Response. Return Code [OK]
--
NC_CACHE
20:59:38,924 INFO [NcMasterCache] Master Cache processing file [processRequest started loader [eri_eran_20q1_02] file [/data/ldr/eri/eran/loader/input/INUSE.Sample_data1.lif]]
20:59:38,924 INFO [NcMasterCache] Master Cache processing summary: file [/data/ldr/eri/eran/loader/input/INUSE.Sample_data1.lif] started [Wed Nov 04 20:59:38 CET 2020], finished [Wed Nov 04 20:59:38 CET 2020], duration [0], response [OK]
Loader
20:59:25,250 INFO [LocalCache] TaskID[454], Master cache instance is initialized for eri_eran_20q1_02 calling processRequestMaster
20:59:32,460 INFO [MasterCacheResponse] TaskID[454], Master Response. Return Code [RELOAD]
--
NC_CACHE
20:59:32,440 INFO [NcMasterCache] Master Cache processing file [processRequest started loader [eri_eran_20q1_02] file [/data/ldr/eri/eran/loader/input/INUSE. Sample_data2.lif]]
20:59:32,459 INFO [NcMasterCache] Master Cache processing summary: file [/data/ldr/eri/eran/loader/input/INUSE. Sample_data2.lif] started [Wed Nov 04 20:59:32 CET 2020], finished [Wed Nov 04 20:59:32 CET 2020], duration [19], response [RELOAD]
In both the cases, nc_cache is finished processing in few milliseconds but the request itself reached with delay at master cache [nc_cache].
Cause
In order to deal with the slowness issue, one of the aspects is to monitor the memory consumption of loader instances at run time.
To monitor the loader memory consumption at run time, if it is not directly throwing outOfMemory error in loader logs or not generating the Java heapdumps for memory utilization, GC logging can be enabled at the application level for loaders and analyze the same for monitoring the instance-wise loader memory utilization in order to work on addressing slowness in loaders.
Environment
Product: Tivoli Netcool Performance Manager for Wireless
Release: 1.3.2, 1.4.x
Platform: Solaris; Linux; AIX
Diagnosing The Problem
- Navigate to the location /appl/virtuo/bin and search for file run_njloader.
- Take the backup of existing run_njloader file.
- Open the run_njloader file and add following line in JVM Properties:

- Save the file and exit.
- Create the directory GC in appl directory as /appl/GC for GC log storage.
- Stop the problematic loader instance having slowness in processing.
- Start the loader instance and verify in the output for GC location.

- Restart the sapmgr service in order to take effect the changes.

- Restart the loader instance again and resume data processing from the loader instance.
- Post enabling the GC logging, it captures the logs as follows:

- Put this file in any of the online portals such as GCEasy that plots the graph for analyzing the same.

- The generated graph is like:


Following table shows that the loading is fastest on the loader (huaweilteeutran_v100r016c10spc125) with default configuration (without Weak-HashMap and Refresh Threshold), but it is the first one to throw OOM error.
For 2nd loader instance (huaweilteeutran_v100r016c10spc125_02), we also got OOM error much later but the Loader recovered from that error due to Weak-HashMap and continued loading.
But it took longest time to load the LIF files more than 1 hour compared to other two loaders. This was due to empty cache that resulted in high NC-time.
|
Loader Instance |
Memory |
Threading |
weak-HashMap |
Refresh Threshold |
No. of LIFs Loaded |
Encountered OOM |
Response Code |
Time taken to Load |
|
|
OK |
RELOAD |
||||||||
|
huaweilteeutran_v100r016c10spc125 |
1.5 GB |
0,0 |
FALSE |
Not set |
1670 |
Yes |
1671 |
446 |
4 Hours 10 minutes |
|
huaweilteeutran_v100r016c10spc125_02 |
2 GB |
0,0 |
TRUE |
Not set |
1670 |
Yes |
1674 |
2143 |
5 Hours 23 Minutes |
|
huaweilteeutran_v100r016c10spc125_03 |
1.5 GB |
0,0 |
FALSE |
100 |
1670 |
No |
1677 |
531 |
4 Hours 18 Minutes |
GC is the process by which Java performs automatic memory management and will free-up the heap memory.
Loader Instance 1

The numbers denote the following:
- The LIF loading process started.
- GC in action, clearing heap memory to prevent out of memory(OOM) error.
- GC in action, clearing heap memory, as the previous peak almost reached the total heap memory.
- The heap memory is almost exhausted and GC tries to clear out memory to prevent OOM.
- Finally, the system goes out of memory. The loader process is restarted which is initiated by sap-mon and the system is revived. We see a high release of memory, which is due to starting of a new loader process.
- The LIF loading process is completed.
Loader Instance 2

The numbers denote the following:
- The LIF loading process started.
- GC in action, clearing heap memory to prevent OOM. This also involves weak working of HashMap implementation. (GC cycle removes the weak-HashMap references actively).
- The system finally goes out of memory. GC works on releasing the memory and system is revived. Thus we see a sharp release of memory. (Loader process is not restarted manually or by sapmon)
- The LIF loading process is completed.
Loader Instance 3:

- The LIF loading process started.
- The heap memory is almost exhausted (free memory reaching 100mb or below). The threshold configuration comes into action, clearing cache, which is followed by GC that releases the memory.
- GC in action, clearing heap memory, as the previous peak almost reached the heap memory.
- Again, the heap memory is almost exhausted (free memory reaching 100mb or below). The threshold configuration comes into action, clearing cache which is followed by GC which releases the memory.
- GC in action, clearing heap memory.
- LIF loading process is completed.
15:54:38,905 INFO [LocalCache] TaskID[534], Allocated memory [1536], Current free/reserved JVM memory [15], Current memory consumed by loader process [1536], Nc cache Memory threshold [0]
15:54:38,905 INFO [LocalCache] TaskID[534], The table NC_NETWORK has 1 no. of rows
15:54:38,905 INFO [LocalCache] TaskID[534], The table NC_EUTRANCELL has 121873 no. of rows
15:54:38,905 INFO [LocalCache] TaskID[534], The table NC_IPSECURITY has 354 no. of rows
15:54:38,905 INFO [LocalCache] TaskID[534], The table NC_GTP_U has 383 no. of rows
15:54:38,905 INFO [LocalCache] TaskID[534], The table NC_IPV6 has 4 no. of rows
15:54:38,905 INFO [LocalCache] TaskID[534], The table NC_ENODEB_SIG has 315429 no. of rows
15:54:38,905 INFO [LocalCache] TaskID[534], The table NC_REGION has 1 no. of rows
15:54:38,905 INFO [LocalCache] TaskID[534], The table NC_FIREWALL has 295 no. of rows
15:54:38,905 INFO [LocalCache] TaskID[534], The table NC_SCTP_H has 1227 no. of rows
15:54:38,906 INFO [LocalCache] TaskID[534], The table NC_EUTRAN_TRAFFICBOARD has 161 no. of rows
15:54:38,913 INFO [LocalCache] TaskID[534], The table NC_IKE has 73 no. of rows
15:54:38,913 INFO [LocalCache] TaskID[534], The table NC_IP_PATH has 264 no. of rows
15:54:38,913 INFO [LocalCache] TaskID[534], The table NC_SECTOR_EQUIPMENT has 1935 no. of rows
15:54:38,914 INFO [LocalCache] TaskID[534], The table NC_OPERATOR_CELL has 35 no. of rows
15:54:38,914 INFO [LocalCache] TaskID[534], The table NC_ETHERNETPORT has 453 no. of rows
15:54:38,914 INFO [LocalCache] TaskID[534], The table NC_ENODEBFUNCTION has 8891 no. of rows
15:54:38,914 INFO [LocalCache] TaskID[534], The table NC_BBUBOARD has 303 no. of rows
15:54:38,914 INFO [LocalCache] TaskID[534], The table NC_IP_LAYER has 227 no. of rows
15:54:38,914 INFO [LocalCache] TaskID[534], The table NC_MARKET has 1 no. of rows
15:54:38,914 INFO [LocalCache] TaskID[534], The table NC_NODE_H has 1 no. of rows
=================================
Removing anything from cache does not release heap memory immediately. Heap memory will be released only after GC runs and removes weak references, Out of scope objects and so on.
Thus, optimal solution is to distribute LIFs according to region and block so that specific cells are catered by particular loader instances.
Resolving The Problem
Follow one or more steps for resolving the issue:
- Identify points on memory graph, check how frequently it reaches peak and increase memory assigned to loader.
- Implement or Modify LIF distribution so that it is based on Node or region, blocks across instances to minimize OOM and cache building. One Gateway instance can load raw files for specific region or regions that transfer LIFs to a set of loader instances. LIFs in these instances can be further distributed based on block.
- Modify LIF distribution to allow idle time of 2 - 5 mins before starting next batch of LIFs for every loader instance.
Document Location
Worldwide
Product Synonym
TNPMW;Tivoli Netcool Performance Manager Wireless
Was this topic helpful?
Document Information
Modified date:
13 March 2022
UID
ibm16562159