IBM Support

Best Practices for running the ITNM DLA books in TADDM

Technical Blog Post


Abstract

Best Practices for running the ITNM DLA books in TADDM

Body

There have been a lot of requests for best practices in regards to loading ITNM DLA books into TADDM lately, so I thought I would share some of the items we have learned recently.  If you have any questions regarding this list, please comment  below. Thanks!

 

Best practices for loading ITNM DLA books;

 

1) You can use the -g option on loadidml with ITNM books to load them faster if you are running a recent maintenance level of TADDM.   The -g option stores objects in batches defined by the dist/etc/bulkload.properties "com.ibm.cdb.bulk.cachesize" value(default 2000 objects). Storing 2000 objects at a time vs. 1 at a time is much faster. The following caveats apply to using -g with ITNM books.

a) the book needs to pass the validator. The validator ships as idmlcert.jar in the $COLLATION_HOME/sdk/dla/validator/v2 directory 

And there is a more recent copy available here if you are not on very recent maintenance levels of TADDM;

http://www-01.ibm.com/support/docview.wss?uid=swg21683440

Here is an example of running it;

/opt/IBM/cmdb/dist/external/jdk-Linux-x86_64/jre/bin/java  -Xmx2000m -jar idmlcert.jar -p idmlcert.properties <book> > validate.out
 

When this is done, the output file will have a summary of results at the top, like this;

[FAIL] - TEST 00 (XML Parse)
[PASS] - TEST 01 (All MEs have a valid ID)
[PASS] - TEST 02 (superior reference IDs in book)
[PASS] - TEST 03 (Attributes are valid)
[PASS] - TEST 04 (All managed elements have a valid naming rule)
[PASS] - TEST 05 (All managed elements are valid)
[PASS] - TEST 06 (All relationships are valid)
 

It is OK if the XML Parse fails, but the highlighted "[PASS] - TEST 04 (All managed elements have a valid naming rule)" must PASS for -g to be successful. If this does not pass, then loading the book with -g will skip any batch that contains a bad object and still end condition code 0. When using -g also check bulkload,log for ERROR(or error.log if bulkload.log wrapped) and you can also check the dist/bulk/results file for the keyword FAILURE.  Any of these might indicate a problem with the contents of the book.

 

b)  There are some known APARs related to using -g, although not every customer will experience these.  If you do have an issue using -g, check these APARs if your symptoms match and confirm if they are included with your current maintenance level;

IV67167 ITNM BOOK HANGS AT END OF THE LOADIDML PROCESS WITH A THREAD DEADLOCK.
IV63998 LOADING A LARGE ITNM BOOK MAY LEAD INTO INFINITE LOOP.
IV60616  ITNM DLA books containing chassis failed with MissingKeyException

 

2)  loadidml by default uses 1G of memory as defined by the dist/etc/bulkload.properties "com.ibm.cdb.bulk.allocpoolsize" value.  This is typically not enough for large ITNM books.  Usually 2G to 4G is required. Keep in mind this memory has to be available on the TADDM server, TADDM itself uses a lot of memory, so if you need to increase this value check that you have available memory to add first. Failure to do this may lead to silent failure of TADDM(eg. if TADDM cannot get the memory allocated to it and has to page out).

3) We recommend you code the -u (user) and -p (password) values on loadidml. This can prevent MissingKeyExceptions if there is discovery or other activity going on that would cause a collision of threads.  Here is an example of a typical loadidml command I use in the lab;

time ./loadidml.sh -u administrator -p password -g -o -f <book>.xml
 

4) If you get a MissingKeyException and you confirmed that the book passed TEST 04 of the validator and you used user and password on the loadidml command, check that your database has enough transaction log space(DB2) or UNDO space(Oracle). Database errors of this sort will typically show up only in the log/services/Naming* logs. The DBA should be able to monitor and add space if needed, but large books can take more space then discovery. 

 

5) If running with a DB2 database for TADDM, ensure that AUTO RUNSTATS are turned off.  You can check this via this DB2 command;

 

db2 get db cfg for poodles | grep RUNSTAT
     Automatic runstats                  (AUTO_RUNSTATS) = OFF
 

If auto runstats are on, turn them off and then run the TADDM specific runstats generate by gen_db_stats.jy. Make sure you are running the gen_db_stats appropriate for your release(the database schema can change with fix packs, so the stats should be re-generated after all fix packs or releases).

It may also help to run the manual stats more frequently during the bulkload, some customers have reported improved performance running them as often as every 30 minutes.

 

6) A couple of comments about dist/etc/bulkload.properties values you may want to review;

 

Note this property also governs the entire length of time a single bulk can run, see this technote;

/support/pages/node/526153

Setting this property to true provides useful statistics for performance monitoring, such as;

# grep left bulkload.log
2015-04-19 15:15:32,273  [t91726]  DEBUG bulk.CdbRandomAccess - Objects left: 395160 ,Status: -1=49 ,0=0 ,1=6435 2=2000 ,statusLevel=2
2015-04-19 15:17:12,031  [t91726]  DEBUG bulk.CdbRandomAccess - Objects left: 393886 ,Status: -1=206 ,0=0 ,1=6490 2=1274 ,statusLevel=2
2015-04-19 15:18:29,721  [t91726]  DEBUG bulk.CdbRandomAccess - Objects left: 392586 ,Status: -1=41 ,0=0 ,1=6447 2=1300 ,statusLevel=2
2015-04-19 15:19:29,051  [t91726]  DEBUG bulk.CdbRandomAccess - Objects left: 391488 ,Status: -1=54 ,0=0 ,1=6349 2=1098 ,statusLevel=2
2015-04-19 15:20:22,620  [t91726]  DEBUG bulk.CdbRandomAccess - Objects left: 389973 ,Status: -1=58 ,0=0 ,1=6341 2=1515 ,statusLevel=2
2015-04-19 15:21:10,201  [t91726]  DEBUG bulk.CdbRandomAccess - Objects left: 388626 ,Status: -1=61 ,0=0 ,1=6331 2=1347 ,statusLevel=2
 

This property will show the guid in the bulk/results file, which is useful when debugging problems, but otherwise you normally would have this set to false.

As mentioned earlier, 1G is typically not enough for larger books, and you may have better performance increasing this value as long as you have the memory physically available.

Most customers do not increase this value, I have yet to see large gain from increasing it, but if you test and see improvement worth noting please command below.

com.ibm.cdb.bulk.apiservertimeout=60
This property specifies the number of seconds before the API server returns an error and the bulk load program stops processing.
com.ibm.cdb.bulk.stats.enabled=false
This property specifies whether statistics gathering of the bulk load program are performed. Turning on statistics decreases performance and increases log and result file sizes.
com.ibm.cdb.bulk.log.success.results=true
This property specifies whether successfully written objects are logged to the results file. Reduced logging can improve performance by reducing output.
com.ibm.cdb.bulk.allocpoolsize=1024
This property specifies the maximum amount of memory that can be allocated to the Bulk Loader process. It is an Xmx value that is passed to the main Java™ class of the Bulk Loader. Specify the value in megabytes.
com.ibm.cdb.bulk.cachesize=2000
This property specifies the number of objects to be processed in a single write operation when performing graph writing. Increasing this number improves performance at the risk of running out of memory either on the client or at the server. Alter this number only when specific information is available to indicate that processing a file with a larger cache provides a benefit in performance. The cache size setting currently can be no larger than 40000.

7) If you have multiple ITNM domains you must have ITNM APAR IV61026. Otherwise the MSSname in the books will be the same even though the data is different. This will cause data to be lost because most ITNM books contain a refresh tag which means 'this book should replace all data from any prior book with the same MSSName". It is very important to understand the concept of the refresh tag behavior and to not artificially manipulate any ITNM book. Let's explain this a little more, in every ITNM book you will have this information at the top and bottom;

There will be an MSSName, such as this;

            <cdm:MSSName>ibm-cdm:///CDMMSS/Hostname=1.2.3.4,ManufacturerName=IBM+ProductName=IBM Tivoli Network Manager IP Edition+Subcomponent=domain+SubcomponentInstanceName=TEST</cdm:MSSName>
 

or this;

 
            <cdm:MSSName>ibm-cdm:///CDMMSS/Hostname=1.2.3.4,ManufacturerName=IBM+ProductName=IBM Tivoli Network Manager IP Edition</cdm:MSSName>
            <cdm:ProductName>IBM Tivoli Network Manager IP Edition</cdm:ProductName>
 

The first one is an example of a book with ITNM APAR IV61026 showing the domain. The second does not have it.  If you have multiple domains, the lack of the domain to distinguish the two can cause the books to over write one another.   This is because most ITNM books contain both the refresh and create tags, eg;

at the top of the book;

 
        <idml:refresh timestamp="2015-03-28T21:14:30Z">
        <idml:create timestamp="2015-03-28T21:14:30Z">
 

and the close tag at the end;

        </idml:create>
        </idml:refresh>
 

the resultant behavior when this occurs is the book is considered a refresh, and anything in the book currently loaded that is no longer in any prior book with the same MSSName will be deleted at the end of the bulkload.  This can take a very long time if there were 10,000 objects in the old book and 10 in the new book. This is a good reason to never remove objects from an ITNM book manually unless you want them to be deleted.  Otherwise, use only the create tag or a different MSSName.

 

[{"Business Unit":{"code":"BU025","label":"IBM Cloud and Cognitive Software"}, "Product":{"code":"SSPLFC","label":"Tivoli Application Dependency Discovery Manager"},"Component":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"","Edition":""}]

UID

ibm11275382