IBM Support

Tivoli Data Warehouse Best Practices

Technical Blog Post


Abstract

Tivoli Data Warehouse Best Practices

Body

 The Tivoli Data Warehouse utilizes a very simple and flat database schema. This simple schema means that certain queries may not perform well, depending on how they are written (complexity, joins, order by, columns used in the WHERE clause, etc...). This article describes some of the best practices when setting up your Tivoli Data Warehouse.

Data Collection

  • only collect the data that you need - collecting unneeded data will take resources in the database and will require extra maintenance in the form of pruning, etc... It will also increase the amount of time the S&P Agent needs to summarize and prune the data, if configured.
  • collect the data using the smallest collection interval that you can - certain data doesn't change very often and doesn't require minute or even hourly data collection. For example, configuration data (Oracle Configuration) or rarely changing data such as IP addresses (Linux_IP_Address).
  • filter any unneeded data from the historical collection. For instance, a SAN file system may be mounted on thousands of servers and it is not necessary to collect that information on all those systems. It is only necessary to collect that data once for appropriate monitoring and historical analysis. Filtering in historical collections was added in ITM 6.2.2 FP2.
  • only collect the data at the systems that need the data collected from. Since ITM 6.2.2, it is possible to control the distribution of the historical collections, so it is possible to control in fine detail which systems will collect the data.

Summarization and Pruning

 

  • setup data retention (pruning) to only keep the data for the shortest amount of time needed to satisfy your business requirements. Keeping data for long periods of time will cause degradation of the TDW performance over time.
  • only summarize the data that you need. If there is no business need to have an attribute group summarized, keep just the detailed data. Each summarization adds to total time needed for the S&P agent to complete its run.
  • summarization periods are independant of each other, meaning that you can have hourly summarization and monthly summarization, without having to enable daily, weekly summarizations. Each additional summarization will slow down the time it takes for the S&P agent to complete its work and causes extra resource usage on the database. Higher period summarizations (above daily) will result in partial rows that won't be complete until that summarization time period ends.
  • certain attribute groups don't summarize well as the summarization is almost a full copy of the detailed data. For instance, a log file attribute group will not summarize well as each row is considered unique unless the data was manipulated to make it meaningful for summarization. Also, process data in general will not summarize well, specially on systems where lots of temporary processes are created. These temporary processes will create several rows of data that will contain only a few samples of data, taking unnecessary space in the file system. If process data must be collected and summarized, use the filtering capabilities introducing in ITM 6.2.2 FP2 to only collect the data that is needed.

Database

ITM 6.2.1 included a new utility as part of the Summarization and Pruning Agent: the schema publication tool. This allows you to generate the DDL for the Tivoli Data Warehouse and modify it (with restrictions) to help improve performance and to address security concerns. For example, the DDL can be modified to enable database compression, create tables on specific tablespaces, file groups, etc... See the ITM documentation for more information on the schema publication tool.

 

 

  • if you have the appropriate RDBMS license, consider using database compression on the warehouse tables. Compressed tables can perform significantly better than uncompressed ones. The schema publication tool can be used to generate the DDL for the tables, which can then be modified to support compression. Existing tables can be altered to enable compression (a reorg will be necessary to compress existing data in the table).
  • periodic database maintenance must be setup and executed in a regular fashion. The insert and prune cycle will cause fragmentation in the database objects (tables, indices) and will also make statistics out of date. This may cause the database optimizer to decide that a full table scan is better than using an index, causing slowdowns. Run regular database defragmentation (such as reorg) and update statistics on tables and indices at least weekly.
  • custom reporting may need additional indices added to make them perform better. The out-of-the-box indices are meant for ITM usage and satisfy the S&P and TEP usage of TDW. The detailed data specifically does not lend to general queries unless the query specifies an agent managed system and the time of interested (WRITETIME column typically). Care must be taken when adding additional indices since each index takes up space and slows down insertion, deletion and updates. Use database tools such as explain plan to ensure that your new indices will be used for your queries.
  • the times are stored in a string format: YYYMMDDhhmmsszzz (described below). Use the string format when doing queries as converting it to a database timestamp in the WHERE clause will make the query ineligible for using an index
    • YYY - year minus 1900
    • MM - month (1-12)
    • DD - day of the month
    • hh - hour (00-23)
    • mm - minutes (00-59)
    • ss - seconds (00-59)
    • zzz - milliseconds (also used for sequencing)
  • consider spreading the objects into multiple database containers (tablespaces, files, etc...). Large tables may need to be placed into separate containers due to their size. Indices may be placed into separate containers from the table data.
  • follow the RDBMS tuning best practices since the default configuration is generally not tuned for your environment. For example, for DB2 the transaction logs should be placed on separate drives from the database tables and indexes. Allocate as much physical memory as is feasible to the database, etc...

 

[{"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Product":{"code":"","label":""},"Component":"","Platform":[{"code":"","label":""}],"Version":"","Edition":""}]

UID

ibm11083483