IBM Support

Dormant Data cleanup in TADDM - A Customer Story

Technical Blog Post


Abstract

Dormant Data cleanup in TADDM - A Customer Story

Body

You probably have seen several documents about dormant data cleanup in TADDM and hopefully are already taking steps to cleanup these old objects. Cleaning up old objects that are no longer discovered ensures that the TADDM data is current and can also improve performance by reducing the size of the database. But do you wonder if you are really cleaning up everything that is old? Most of the dormant cleanup articles refer to Computer Systems and change history -- is there more to it then that?  Recently we went through a detailed analysis of dormant data in a customer environment. This analysis reviewed all classes which had a count of more then 10,000 objects with an last modified time of greater then 45 days. The customer was running TADDM 7.2.1.4 with a DB2 database and approximately 6 million CIs.  This blog entry is meant to document the findings and recommendations that resulted from that analysis.  Actually, while the analysis is 90% done, the work still progresses so there may be more updates to this as we finish up or learn more from other customers.  That being said, this is what we found:

USE THE API TO CLEANUP DORMANT CIs

In general, the TADDM API delete command should be utilized to cleanup dormant CI data. This is because in most cases a CI is represented in many database tables and SQL cleanup is risky as you would need to know all the tables that CI is in, otherwise you risk corrupting the database and potentially causing errors during discovery.

The format of the api delete command is;

dist/sdk/bin/{api.sh/api.bat} -u user -p password delete {guid}
 
You can run more then one api delete at a time, although you should monitor CPU and memory resources such that you do not exceed the capacity of the server, especially if other activities such as discovery are occurring at the same time. At the customer site we analyzed, 4 api deletes concurrently were utilizing all available CPU when run on the Primary Storage Server. We recommend where possible running dormant cleanup on a secondary storage server, or splitting across multiple secondary storage servers. We also recommend running this process during idle time so that there is no chance to be deleting a CI that may be in process of being updated.

There are a few exceptions to this "use the api" rule which I will discuss later in this post.

CLASSES TO CLEANUP

The following tables represent the classes we analyzed and the database query we used to find the dormant guids to pass to the api delete. In all queries listed below, ${DATE} represents the time stamp, in epoch format, of the date you want to delete CI's older then. This ${DATE} will be compared to the CI's last modified time (LMT) to determine if it meets the dormant criteria or not.  I will discuss methods to obtain this date later in this post.

TABLE 1: These classes can be cleaned up directly based solely on the last modified time(LMT) in the database.  In the customer analysis, the LMT we used was 45 days prior to the current date.

Class NameDatabase query to find dormant guids
ComputerSystemselect GUID_C from BB_COMPUTERSYSTEM40_V where lastModifiedTime_c < ${DATE}
AppServer
select GUID_C from BB_APPSERVER6_V where lastModifiedTime_c < ${DATE}
Segment select GUID_C from BB_SEGMENT21_V where lastModifiedTime_c < ${DATE}
Serviceselect GUID_C from BB_SERVICE81_V where lastModifiedTime_c < ${DATE}
OperatingSystem select GUID_C from BB_OPERATINGSYSTEM62_V  where lastModifiedTime_c < ${DATE}
CPU select GUID_C from BB_CPU10_V where lastModifiedTime_c < ${DATE}
NetworkConnectionselect GUID_C from BB_NETWORKCONNECTION38_V where lastModifiedTime_c < ${DATE}
Relationshipselect GUID_C from BB_RELATIONSHIP26_V where lastModifiedTime_c < ${DATE}
PhysicalPackage select GUID_C from BB_PHYSICALPACKAGE60_V where lastModifiedTime_c < ${DATE}
PhysicalConnector select GUID_C from BB_PHYSICALCONNECTOR3_V where lastModifiedTime_c < ${DATE}
Vlan select GUID_C from BB_VLAN89_V where JDOCLASS_C = 'com.collation.topomgr.jdo.topology.net.VlanJdo' AND lastModifiedTime_c < ${DATE}
SoftwareComponentselect GUID_C from BB_SOFTWARECOMPONENT68_V where lastModifiedTime_c < ${DATE}
TransportEndpointselect GUID_C from BB_TRANSPORTENDPOINT84_V where lastModifiedTime_c < ${DATE}
 
TABLE 2: Typically L2Interfaces are removed when there parent is removed, but there may be cases where this is not the case, such as you have not installed a fix pack that contains APAR IV38375. The following query detects L2interfaces which have no parent computer system. These are orphaned and can be deleted without reference to a last modified time. It is important when running queries without LMT, such as the below, during IDLE time as otherwise you may inadvertantly capture a guid that is transient at the time (being stored or updated).
 
Class NameDatabase query to find dormant guids
L2Interface SELECT BB_L2INTERFACE41_V.GUID_C FROM BB_L2INTERFACE41_V  LEFT OUTER JOIN BB_COMPUTERSYSTEM40_V ON (BB_L2INTERFACE41_V .PK__PARENTL2INTERFACE_C = BB_COMPUTERSYSTEM40_V.PK_C) WHERE (BB_COMPUTERSYSTEM40_V.GUID_C IS NULL)

TABLE 3:  Some classes represent naming attributes which can be used for multiple CI's. For example, an IpAddress is used as part of the BindAddress naming attribute which is one of the naming rules for AppServers.  So a single IpAddress could act as the naming rule for many AppServer objects on a host. If you simply delete the IpAddress based on LMT you will be removing the AppServer objects as well which may not be dormant.  This can occur in cases where the LMT on the IpAddress is not properly modified due to such conditions as a failed OS sensor, yet the Application sensor worked and the AppServer CI has a current LMT.  For the classes below, you should not only check the LMT of the guid, but also check if it is referenced as a superior guid in the database table superiors. If it is in the table superiors, then it is serving as a naming rule attribute for one or more CIs and should not be deleted. 

NOTE - the queries in the table below are DB2 specific. For Oracle, the HEXTORAW function must be used. For example, when Oracle is the back end database, use this for the LogicalContent query;

select GUID_C from BB_LOGICALCONTENT42_V where lastModifiedTime_c < ${DATE} and HEXTORAW(GUID_C) not in(select sup_supr_guid from superiors)

 

Class NameDatabase query to find dormant guids
LogicalContent select GUID_C from BB_LOGICALCONTENT42_V where lastModifiedTime_c < ${DATE} and GUID_C not in(select hex(sup_supr_guid) from superiors)
Fqdnselect GUID_C from BB_FQDN31_V where lastModifiedTime_c < ${DATE} and GUID_C not in(select hex(sup_supr_guid) from superiors)
IpAddress select GUID_C from BB_IPADDRESS73_V  where lastModifiedTime_c < ${DATE} and GUID_C not in(select hex(sup_supr_guid) from superiors)
BindAddress

DB2:

"select b.GUID_C from BB_BINDADDRESS67_V b inner join PERSOBJ_ALIASES pa
on b.GUID_C = pa.GUID_X where b.lastModifiedTime_c < ${DATE} and  pa.NRS_GUID_X not in (select sup_supr_guid from superiors)"             

Oracle:

select guid_c from BB_BINDADDRESS67_V left outer join superiors on HEXTORAW(GUID_C) = sup_supr_guid where sup_supr_guid is null and lastModifiedTime_c < ${DATE}                                      

 

 

 
 
TABLES TO USE DATABASE SQL TO CLEANUP
 
You should also be cleaning up change_history_table and change_cause_table on a regular basis. Otherwise these tables can get very large and degrade performance over time. You do not need to use the API to cleanup this data. This is documented here;
 
 

One caveat -- if this is the first time you are doing cleanup, or if your cleanup interval is not on at least a weekly basis, you should use proper SQL to delete in batches. If you try to delete too many rows at once it can cause your DB2 server to run out of transaction log or Oracle to run out of UNDO space. Use "FETCH FIRST" in DB2 or "WHERE ROWNUM <" in Oracle to only delete the number of rows that your database can handle in one transaction.  For example to delete in batches of 50,000 for DB2;

"DELETE FROM CHANGE_HISTORY_TABLE WHERE ID IN (SELECT ID FROM CHANGE_HISTORY_TABLE WHERE PERSIST_TIME < ${DATE} FETCH FIRST 50000 ROWS ONLY)" 
 
"DELETE FROM CHANGE_CAUSE_TABLE WHERE CAUSE_ID IN (SELECT CAUSE_ID FROM CHANGE_CAUSE_TABLE WHERE CAUSE_ID NOT IN (SELECT ID FROM CHANGE_HISTORY_TABLE) FETCH FIRST 50000 ROWS ONLY)" 
 

And for Oracle;

"DELETE FROM FROM CHANGE_HISTORY_TABLE WHERE ID IN (SELECT ID FROM CHANGE_HISTORY_TABLE WHERE PERSIST_TIME < ${DATE}  and ROWNUM < 50000)"

"DELETE FROM CHANGE_CAUSE_TABLE WHERE CAUSE_ID IN (SELECT CAUSE_ID FROM CHANGE_CAUSE_TABLE WHERE CAUSE_ID NOT IN (SELECT ID FROM CHANGE_HISTORY_TABLE)  and ROWNUM < 50000)"

 

Additionally, you may find that the first time you start cleaning up the BindAddress class that there may be so many guids to delete that the api is not a reasonable choice.  This class generates a lot of dormant records, at the customer we worked with, 60-80K a week were created, and the first time we did cleanup there were millions of guids. With this in mind, the FIRST time you cleanup this class, if using TADDM 721, you can use SQL to do the deletes. Always perform a DB backup prior to deleting anything directly from the database. The tables to cleanup are bindaddr, persobj, mssobjlink_rel and aliases. Below is an example snippet of the code used to accomplish this;

 

-- First find the guids to delete(this query may take a long time if the bindaddr table is large);

db2 "SELECT GUID_X FROM bindaddr WHERE lastModifiedTime_x < ${DATE} and  guid_x not in(select hex(sup_supr_guid) from superiors)" >GUID.txt     
                                                                        
-- Then delete them in each DB table. Note that the format of the guid input for the ALIASES table is different then the other queries. The data in the ALIASES table is defined with the "FOR BIT DATA" clause, which means it's stored as hexadecimal data. For tables such as this, when you query a guid column you need to convert the data. In DB2 you do this by prefacing the guid with an 'x', eg.  x'123456..'. In Oracle you do this with the HEXTORAW function, eg. (HEXTORAW('123456...')). Here is an example of the code to delete the guids that are listed in the file with DB2 syntax;

...                                

   for GUID in `cat ${FILE}`                                            
   do                                                                   
   db2 "DELETE FROM BINDADDR WHERE GUID_X = '${GUID}'" >/dev/null       
                                                                        
   db2 "DELETE FROM PERSOBJ WHERE GUID_X = '${GUID}'" >/dev/null        
                                                                        
   db2 "DELETE FROM MSSOBJLINK_REL WHERE OBJ_X = '${GUID}'" >/dev/null  
                                                                        
   db2 "DELETE FROM ALIASES WHERE ALIASES.AL_MASTER_GUID IN   (x'${GUID}')" >/dev/null                                                
                                                                        
   db2 "COMMIT" >/dev/null                                              
  done                                                                  
..                                                    

Please use api delete after the initial cleanup as the tables that bindaddr CIs exist in today may change in the future. As noted above, deleting directly from the database is risky, only perform this task if you are comfortable with SQL and monitoring the effects of the queries and that they completed successfully. Always have a DB backup available in case problems are encountered. Consult your DBA for further assistance if needed.

                                                                        
 
WHAT IS LAST MODIFIED TIME 'LMT' IN TADDM AND HOW CAN I PROGRAMATICALLY USE IT TO WHEN I DO THE CLEANUP?
 

LastModifiedTime, otherwise known as 'LMT' in TADDM is the difference, measured in milliseconds, between the current time and midnight, January 1, 1970 UTC taken on the StorageServer which is storing the data. 

On Unix systems the command;

date '+%s'

returns seconds since 1970-01-01 00:00:00 UTC, so this value is 1000 times less then LMT which is in milliseconds. You can use this command on Unix systems to return current time stamp, multiple it by 1000 and then subtract (dormant days * 86400000) to get your DATE value. 86400000 is one day in milliseconds. Then assign calculated value to query e.g.

dist/bin/./dbquery.sh  "select guid_c, LASTMODIFIEDTIME_c from BB_COMPUTERSYSTEM40_V where LASTMODIFIEDTIME_c > 1368721165000"
 
 

Here are some functions you can create on DB2 if you prefer to use DB2 time stamp in queries:

-- to convert DB2 timestamp to LMT
CREATE FUNCTION DB2TIMESTAMPSTAMP_TO_LMT (DB2TIMESTAMP TIMESTAMP) RETURNS BIGINT LANGUAGE SQL CONTAINS SQL DETERMINISTIC RETURN (CAST (DAYS(DB2TIMESTAMP) - DAYS('1970-01-01') AS BIGINT) * 86400 + MIDNIGHT_SECONDS(DB2TIMESTAMP)) * 1000;
 
-- to convert DB2 timestamp to UNIX timestamp
CREATE FUNCTION DB2TIMESTAMPSTAMP_TO_UNIXTIMESTAMP (DB2TIMESTAMP TIMESTAMP) RETURNS BIGINT LANGUAGE SQL CONTAINS SQL DETERMINISTIC RETURN CAST (DAYS(DB2TIMESTAMP) - DAYS('1970-01-01') AS BIGINT) * 86400 + MIDNIGHT_SECONDS(DB2TIMESTAMP);
 
 
To get the difference, measured in milliseconds, between the current time on DB2 and midnight, January 1, 1970 UTC you can run this query using the functions above;
 dist/bin/./dbquery.sh  "select DB2TIMESTAMPSTAMP_TO_LMT(current timestamp) from sysibm.sysdummy1"
 
When the DB2 server has same timezone as StorageServer then you can use a query like the below to query specific classes. In this example, we are selecting the computer systm guids and their LMT which have an LMT older then 100 days (86400000 * 100);
 dist/bin/./dbquery.sh  "select guid_C, LASTMODIFIEDTIME_C from BB_COMPUTERSYSTEM40_V where LASTMODIFIEDTIME_C < (select DB2TIMESTAMPSTAMP_TO_LMT(current timestamp)-8640000000 from sysibm.sysdummy1) order by LASTMODIFIEDTIME_C"
 

 

 
You can also use Java class provided for change history cleanup to obtain a time stamp in the proper format, see the example here;
 

To use this;

1. Go to directory dist/sdk/examples/java
2. Open new file DateToString.java
3. Copy  the java code provided in the documentation link above into file and save the file
4. Run ./build.sh
5. Run DateToString dd/MM/yyyy to get value in milliseconds e.g. to convert a specific date to LMT format to use in the queries;
 
java DateToString 1/11/2008
Date is Sat Nov 01 00:00:00 EST 2008
MillSeconds is 1225515600000

 

 

COULD I STILL HAVE DORMANT DATA AFTER THIS PROCESS IS COMPLETED?

Yes, there will probably be some, perhaps low numbers from before fixes were implemented or in tables that the customer we analyzed did not have data in. To check your database for other dormant data, most CI specific tables contain a lastmodifiedtime_x column that you can query.  You can get a list of all tables in the database with SQL such as;

DB2:  select TABNAME from syscat.tables where tabschema = '<com.collation.db.user>'";  

ORACLE: select table_name from dba_tables where owner='<com.collation.db.user>' order by table_name;

Replacing <com.collation.db.user> with the value of the db user in your collation.properties file.  The user name must be in upper case in the query.

Then loop thru those tables querying for for a row count where lastmodifiedtime_x is less then your desired dormant date. E.g.;

 

for TABLE in `cat dbtables.txt`
do
 
       $COLLATION_HOME/bin/dbquery.sh "select count(*) from ${TABLE} where lastModifiedTime_x < ${DATE}" 

..

If a table does not have this column, you can ignore the table, only class specific tables will have the column.

If you find large amounts of data not cleared after you implemented the cleanup process for all tables above, open a PMR and Support will help determine next steps.

 

PROBLEMS WE ENCOUNTERED DURING THE ANALYSIS

While we were analyzing the customer data, we did find some issues.  As noted above we were running with TADDM 7.2.1.4, the following APARs were opened during the analysis to correct problems found.  These APARs will be shipped in upcoming fixpacks(some are in 7.2.1.5 which is already GA);

IV38375 L2INTERFACES ARE NOT REMOVED FROM COMPUTER SYSTEMS IN TADDM WHEN THEY ARE REMOVED FROM PHYSICAL HARDWARE
IV38668 COMPUTERSYSTEMCONSOLIDATIONAGENT MERGES COMPUTER SYSTEM WHEN USING IP ADDRESSES FROM PRIVATE NETWORK
IV43076 ADD OPTION TO NOT HAVE COMPUTER SYSTEM LMT UPDATED BY IPDEVICE
IV43886 CHILD CIS ARE NOT MERGED WHEN THEIR ALIASES ARE RE-ASSIGNED
IV44193 CONNECTIONDEPENDANCYAGENT CAN UPDATE DORMANT COMPUTER SYSTEM
IV44543 FCVOLUMES NOT REMOVED WHEN NO LONGER DISCOVERED
IV45710 LAST MODIFIED TIME NOT UPDATED BY TOPOLOGY AGENTS
IV46309 OBJRDISC TABLE HAS DORMANT RECORDS WHEN REDSICOVERY IS TURNED ON AND GROWS VERY LARGE
IV46492 IPDEVICE SENSOR DOES NOT UPDATE IPINTERFACE LAST MODIFIED TIME(LMT)
IV46740 VLANINTERFACES THAT ARE NO LONGER ASSOCIATED WITH A VLAN ARE NOT REMOVED FROM TADDM
IV39214 LASTMODIFIEDTIME IS NOT ALWAYS UPDATED WHEN STORING PHASE ENDS

You can look up the status of these APARs, and any other on the TADDM Support Portal here;

http://www-947.ibm.com/support/entry/portal/product/software/tivoli/tivoli_application_dependency_discovery_manager?productContext=267282604

OTHER RESOURCES

There is an RFE to make this process part of the product.  If this topic interests you, please vote for it here;

http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=26783

There is sample code for a dormant CI cleanup process on the wiki here;

https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/Tivoli%20Application%20Dependency%20Discovery%20Manager/page/Expiring%20TADDM%20Data

And also for change history here;

https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/Tivoli%20Application%20Dependency%20Discovery%20Manager/page/Purge%20Change%20History

 

Please comment below if you have any questions on this document. 

[{"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Product":{"code":"SSPLFC","label":"Tivoli Application Dependency Discovery Manager"},"Component":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"","Edition":"","Line of Business":{"code":"LOB45","label":"Automation"}}]

UID

ibm11275388