Common performance problem of data warehousing
File size of the shortterm history data stored in local file system grows continuously, data were not able to be warehoused fast enough.
Efficient data warehousing question list
1. What data need to be collected and how they will be used?
2. Where is the history data located, TEMA or TEMS? What are the performance impact?
3. Whether multiple WPAs required?
4. Where should the WPAs be located?
5. How many agents that each WPA can serve?
6. Any variable that can be tuned to achieve faster warehousing?
7. Is the warehouse database at the best performance?
Predict and control size of historical data
1. Start with your Use Cases. Consider the use cases for each attribute group when configuring historical collection.
2. Gather ONLY the data you need from ONLY the systems you need it from
Example: 1000 identical Web Servers in a farm only need historical data for performance planning purpose from a few systems
Example: filter the history collection for PROCESS with VSIZE>50MB
3. Process the data only as you need it.
- Standard TCR reports & ITPA use only hourly and daily summarization.
- Adaptive / dynamic thresholding uses only detailed data.
- Never summarize high-volume, low repeatability data unless you have use case that demands it.
4. Keep the data for ONLY as long as you need it
5. Manage the Warehouse Database
- No substitute for DBA input
- Use the Projects Spreadsheet and keep it up to date as requirements and environment changes
- Monitor the Warehouse Database for problems
Warehouse Load Projections
1. Get the ITM Data Warehouse Load Projects spreadsheet from IBM ISM Library
2. Estimate the amount of disk space and network throughput required for historical data collection and warehousing
- Total TDW inserts per hour
- Total TDW MB of data inserts per hour
- Total GB of data in TDW
3. Estimate the disk space required for shortterm historical data on Agent.
4. Find out the largest tables that make up the majority of the warehouse data.
Historical data location
1. TEMA in favor of TEMS.
2. Think carefully of the performance impact before choosing TEMS as the collection location.
- Performance implications for the data gathering step in large scale environments with several hundred agents connected to one TEMS.
- Large chunk of data warehousing from one box vs. much smaller chunk of data warehousing from multiple boxes.
- All agents write to one file for each attribute groups. Slow workspace response time for shot-term historical data.
- Demands large disk storage, CPU and memory usage of the TEMS.
- Drastically lowers the number of agents managed by the TEMS.
- Loses the access to the shortterm historical data during failover to a secondary TEMS.
1. Typically placed on the same LAN segment as the Warehouse database to allow for the best throughput to the database.
2. Use multiple WPAs when the number of agents collecting historical data > 1500
3. Recommended deployment is to deploy one WPA on each RTEMS.
- If the primary RTEMS goes down and the agent fails over to the secondary RTEMS, the agent should be able to upload historical data through the WPA on the secondary RTEMS.
- Limiting the number of agents that upload historical data through a single WPA.
WPA register & failower
1. Mulitple WPAs provide greater scalability and performance, as well as the failover mechanism.
2. All WPAs must be configured to connect to HUB TEMS, not RTEMS.
3. Use KHD_WAREHOUS_TEMS_LIST in config file of WPA to specify the list of TEMS. The agents directly connected to these TEMSes will be served by this WPA.
4. A TEMS name can only be specified in the list for ONE WPA. You should not specify the same TEMS name in more than one WPA TEMS list.
How WPA collects and transfers data to the warehouse
1. Exporter threads and database connection pool size
Exporter threads remove export buffers from the work queue, prepare SQL statements, and insert data into the warehouse using JDBC/ODBC.
KHD_EXPORT_THREADS – number of exporter threads
KHD_CNX_POOL_SIZE – number of database connections
- Both with default of 10.
- These two variables should set to the same value.
- If you have multiple WPAs, the default value may cause performance bottleneck problems at the warehouse. Consider to reduce the number of exporter threads at each proxy.
2. Work queue
KHD_QUEUE_LENGTH – Size of the work queue
- Default is 1000
- The value should be at least equal to the number of clients that regularly upload data to the WPA.
- If the Warehouse Proxy log files show a significant number of rejected requests, then consider increasing the value.
3. NCS listen thread
CTIRA_NCSLISTEN – number of Network Computing System (NCS) listen threads allocated to process incoming RPCs.
- Default is 10, max is 256.
- Increase this value to improve concurrency performance.
- Starting from 10 NCS listen thread per single export thread.
- Increase the ratio if there are numerous RPC errors found in the WPA's log file.
4. Batch inserts
Enable WPA to submit multiple execute statements to the warehouse database.
- Using batch inserts is recommended, and is the default setting
5. Make sure the nofiles is set higher than the number of agents that are uploading data through the WPA. (AIX or Linux)
# ulimit -n
6. Extend the Java heap size (AIX or Linux)
7. Disable logging to the WARREHOUSELOG by setting KHD_WHLOG_ENABLE=N.
Enable the following tracing instead to record the agent, attribute group, number of rows exported, which provides useful information for performance analysis with negligible overhead.
KBB_RAS1=ERROR (UNIT:khdxdbex OUTPUT)
Data Warehouse tuning
1. Separate the largest tables into separate table spaces or data files.