White Papers
Abstract
We are observing slow performance in one or more use cases. What are the general IBM recommendations or best practices that we can implement to improve performance or diagnose the root cause?
Content
Before attempting to tune performance, we must have a clear idea as to what the exact problem is. We need to know the exact use case that is not performing optimally and what the desired performance would be. Does it only happen at peak load or is it a general issue? Is it limited to certain use cases, users, or components of the product? Depending on these exact details, we can chose an action plan best suited to our particular case. Moreover some issues, like less than optimal handler code or configuration, will not be effectively troubleshot with a generalized approach and attempting to do so would be inefficient.
Identifying the Problem
One of the best ways to isolate the problem is by enabling the TIMER logs which will give us the exact run time of each interaction at the application level. Therefore if slow searches are reported, then we can validate this with the precise time it takes for the application to do the search. A sample snippet from the logs is shown below (prefixed by keyword TIMER):
13:14:33 [1251109184] TIMER IXN=MEMGET, USR=a, elapsed = 0.024 seconds.
13:14:33 [1252694336] TIMER IXN=MEMPUT, USR=d, elapsed = 0.105 seconds.
13:14:33 [1252694336] TIMER IXN=MEMGET, USR=a, elapsed = 0.029 seconds.
13:14:33 [1251109184] TIMER IXN=MEMGET, USR=a, elapsed = 0.055 seconds.
13:14:33 [1253222720] TIMER IXN=MEMPUT, USR=b, elapsed = 0.087 seconds.
13:14:33 [1251637568] TIMER IXN=MEMSEARCH, USR=c, CANDS=70, elapsed = 0.198 seconds.
13:14:33 [1252694336] TIMER IXN=MEMGET, USR=a, elapsed = 0.032 seconds.
13:14:33 [1253222720] TIMER IXN=MEMGET, USR=c, elapsed = 0.034 seconds.
13:14:33 [1251109184] TIMER IXN=MEMGET, USR=d, elapsed = 0.051 seconds.
Pre v-11, this logging is enabled from a logger in log4j.xml file as shown below:
<logger name="com.initiatesystems.hub.logging.TimerLog">
<level value="ALL"/>
</logger>
V-11 onwards, logging is enabled from WebSphere Application Server Administration Console. To modify it, just log into the WAS Admin Console > Troubleshooting > Logs and trace > and Select the server where the operational server is deployed > Diagnostic Trace. Here you will see 2 tabs, Runtime and Configuration. Runtime tab is for changes to take effect immediately whereas changes on the Configuration tab take effect only when the server is restarted. Note that if you change the Runtime tab and not Configuration, then the changes will be reverted back upon restart because the JVM will read from the Configuration tab while starting.
The options on either tab is identical; go to Additional Properties and click on the Change log detail levels. Either expand the com.ibm.mdm.mds.log.TimerLog component and set to ALL to enable timer logging or add this timer log entry manually in the text box in the center of the screen. Finally click Apply at the bottom of the page to apply the new settings.
Another way to narrow down the problem is to have the DBA monitor the database. This will help determine if at the time of reported slowness, we have suspicious activity in the database. An example is whether the run times are slow or if the SQL run time are fast but they are in wait state for an extended period of time. It will be important to determine whether all available DB connections are being used up, some connections are being snipped, whether the database is even receiving communication from the engine, etc.
There are three key concepts in performance which are as follows:
1. Work
Work is the processing that needs to occur to accomplish the desired output. Work is performed by all the subsystems within a server (CPU, memory, storage, and network). The limiting factors are CPU’s ability perform the work to process data, memory which optimizes the flow of information from persistent storage to the CPU, networks allow servers that perform different work to communicate with each other, and hard disk to physically store data. The hardware enforces a physical limit on how much processing can be done which is hard to improve without additional hardware. Following are a couple of common tests to use:
1.1. You may monitor the overall memory usage of the server using OS specific tools and whether you see 100% memory CPU and swap space being consumed. MDM uses both java and native heap extensively and although java heap can be limited using the maximum heap -Xmx variable from WAS Admin Console (v-11 onwards) and wrapper.java.maxmemory parameter in wrapper.conf (pre v-11), native memory is not. This can cause high native memory usage which can only be reduced indirectly by reducing the number of threads or context pool size. For details on how memory is managed, refer to the following document: http://www-01.ibm.com/support/docview.wss?uid=swg21653445.
1.2. The performance of many software applications is inherently limited by disk I/O. Applications that spend the majority of CPU time waiting for I/O activity to complete are said to be I/O-bound and you should work with your DBA to identify and resolve this.
In general, we have observed that initial bulk load is CPU and memory intensive with mpxcomp being the most computationally intensive phase of the BXM process. But Real-Time Processing maintain an OLTP style workload on a database server and IO (Input/out) profile is read dominated. As a mix of read/write activity, MDS profile is typically 80% to 90% read during run time.
2. Latency
Latency is the time it takes to execute a unit of work. It's a measure of how long a transaction takes to complete and is typically measured in milliseconds (ms) per transaction. Some of the best practices to improve latency are:
2.1. Buckets
Buckets are the output of the derivation attributes configured to participate in candidate selection. The standardized string is converted to a format that makes for quick access during candidate selection and is essential for quick data retrieval. So larger the buckets, more data is being retrieved and used for computation causing slow transaction. Therefore keeping bucket size small is essentials and following are some of the means of achieving this:
2.1.1. Implement Frequency Based Bucketing (FBB) to prevent large buckets from being used. Regular FBB marks individual buckets as anonymous by adding them to mpi_strfreq table and does not use them whereas dynamic FBB deletes the large buckets instead of just ignoring them. The following document covers architecture and implementation of both types of FBB in detail: http://www-01.ibm.com/support/docview.wss?uid=swg21655423
2.1.2. Manually deleting large buckets e.g. the following SQLs would identify and delete large buckets:
Identify: select bkthash, count(bkthash) from mpi_membktd group by bkthash having count(bkthash) > 5000 order by count(bkthash) desc;
Deletion: delete from mpi_membktd where bkthash = xxxxxxx;
2.1.3. Find specific large buckets and add the entries to the anonymous value list (the MPI_STRANON table). This option will eliminate the large buckets. However, we only recommend doing this if the value truly is anonymous.
2.1.4. Reduce the number of active (nsActive) and available (nsExists) attribute values for each member. If you have the nsExists set to 0 for a member's attributes, this means that for each attribute, the member can have unlimited values. Depending on how the algorithm is defined, this can drastically increase the number of buckets. The number of active and available values are stored in the mpi_segattr table. These valyes are defined by selecting an attribute in the 'Attributes' section of 'Member Types' tab of the .imm file using 'Maximum active values' and 'Maximum existing values' of the properties window.
2.2. Big Entities
Check for the existence of big entities i.e. entities with lots of members causing unnecessary processing and hence slow performance. Big entities are usually reflective of bad data in source systems with lots of duplicates that needs to be cleaned up.
2.3. Slow SQLs
Check for long running SQLs in the database and take corrective action to improve run time. For identification, you may use AWR reports for Oracle, Perfmon for SQL Server, and db2top or SYSIBMADM.LONG_RUNNING_SQL for DB2. SQL tuning is database specific and best done by the DBA but following are some common techniques:
2.3.1. Reorganizing tables.
2.3.2. Updating run time statistics.
2.3.3. Creating custom indexes for slow select statements (not supported).
2.3.4. Partitioning database table or pinning it in memory (not supported).
2.3.5. Inadequate storage I/O subsystem
2.3.6. Buffer pool too small; we generally want to see that at about 99%.
Buffer pool size should be tuned based on the data volume and data profile. Our general recommendation is to allocate 1 GB memory per 1 million member for buffer cache for optimal performance. The DBA may tune it later depending on the environment workload profile, the desired size could be smaller or larger depending on the size of the payload. You may use the following SQL to estimate the number of members in your system:
select count(*) from mpi_memhead;
2.4. Large Result Sets
The data that is returned to the client has a direct impact on storage (like SAN), database CPU, and memory as well as the network and hence performance. Therefore application developers and administrators should modify the custom code, general APIs, broker, and web user interface configuration to not return unnecessarily large result set. There are several settings on the client side that can be changed to limit the result sets such as segment code filter, minimum search threshold, maximum members returned, etc.
2.5. Errors
Check the logs for errors or suspicious logging during execution and if you find any, troubleshoot them.
3. Throughput
Throughput is a rate, a measure of how many actions or work is completed in a unit of time. This is like the measure of parallelism and how many transactions can be completed simultaneously. This is extremely important where latency is fast but the transaction spent most of their time in wait state causing delays.
3.1. Increase the number of contexts available to the application and hence improve parallelism. Pre v-11, his is set using contextPoolSize variable using com.initiate.server.net.cfg (for engine) and com.initiate.server.handler.cfg file (for handlers) in $MAD_HOME/inst/mpinet_<InstanceName>/conf directory. In v-11 and beyond, this is set using the same variable in $WAS_HOME/profiles/SERVER_NAME/installedApps/CELL_NAME/MDM-native-E001.ear/native.war/com.ibm.mdm.mds.net.logic.cfg file.
3.2. Increase the number of maximum allowed active connections, idle connections, and wait time using maxActive, maxIdle, and maxWait variables in $MAD_HOME/inst/mpinet_<InstanceName>/conf/com.initiate.server.jdbc.cfg (pre v-11) and $WAS_HOME/profiles/SERVER_NAME/installedApps/CELL_NAME/MDM-native-E001.ear/native.war/com.ibm.mdm.mds.jdbc.cfg (v-11 and beyond). Before changing these connections make sure that the database allows the corresponding number of connections. MDM SE uses DataDirect drivers and you may add any DataDirect property to the above mentioned files if your DBA believes it will improve performance.
Scaling
General interaction run times for an enterprise deployment are expected to be along the following lines:
MemPut < .5 sec
MemSearch < 1 sec – varies if you define large FBB
MemGet < .3 sec
EM < 1 sec – varies if you define large FBB
These are independent on the scale of the deployment and more influenced by factors discussed in the latency section above (size of entities, buckets, result sets, etc). General scaling Considerations for large deployments are as follows:
- There are no application imposed limits on scaling out the application and each engine instance can scale up to the limit of either the physical RAM available (which would limit the number of threads you can start up) or the CPU available.
- All interactions are stateless, atomic transactions.
- Database transactions are all single transactions, so any given Initiate interaction, if it fails, does not leave data in a partially committed state
- Multiple instances of the Master Data Engine can coexist, operating against the same database (the ‘state’ of any given member record or entity is always held in the database, not in-memory in the application).
- Assignment of internal IDs is also centralized to the database, so multiple Master Data Engine instances do not have to coordinate ID blocks or ranges.
- Each engine can be configured for frequency of ‘dictionary refresh’ – even if the configuration data in the database is updated through a different engine, all engine instances become aware of the change.
Storage
- Proper Input/Output Operations Per Second (IOPS) capacity sizing for a projected workload is critical to run time latency and throughput performance.
- Properly sized buffer caches at the DB tier can significantly reduce reliance on high speed IO (99% Buffer Cache Hit Ratio).
- Latency sensitive installs should target read latencies below 2 ms. One way of achieving this is to have access to dedicated SAN (Storage Area Network) resources with large intelligent caches. You may have other storage set-up as long as the read latency is sufficiently low.
- Latency sensitive installs are strongly advised to deploy dedicated SSD based storage (Solid State Drive), or a hybrid FC/SSD approach with automatic data placement.
- All installs should have storage with sustained read latencies of 6 to 8 ms at peak load.
DB2
- Use a dedicated instance with adequate dedicated resources.
- Minimize the number of buffer pools deployed (IBM tests with 2 pools at scale - one for indexes and one for tables).
- Use statement concentrator.
- Use memory compression.
- Use automatic storage.
- Cap the package cache size.
- Most instances should use Self Tuning Memory Manager (STMM). High performance deployments should use STMM to establish optimized buffer pool distributions, then fix the pool sizes for their most sensitive workload.
- Bypass file system cache. Read ahead caches provide no benefit to the Initiate MDS run time workload; in many cases they significantly degrade performance.
Additional configuration recommendations: http://www-01.ibm.com/support/docview.wss?uid=swg21674811
Oracle
- Use a dedicated instance with adequate dedicated resources.
- Use CURSOR_SHARING=FORCE
- Use ASMM (automatic memory management)
- Use ASM (automatic storage). If ASM is not in use, enable DIRECTIO.
- Bypass file system cache. Read ahead caches provide no benefit to the Initiate MDS run time workload; in many cases they significantly degrade performance.
Additional configuration recommendations: http://www-01.ibm.com/support/docview.wss?uid=swg21674800
SQL Server
- FORCED_PARAMETERIZATION = on
- ALLOW_SNAPSHOT_ISOLATION = on
- READ_COMMITTED_SNAPSHOT = on
Engine
- Conduct a thread scaling study to determine the appropriate context count at the MDS engine tier
- There are no hard limits on context counts per engine, there are instances running 100+ contexts per engine.
- Pay careful attention to client threading and it's influence on engine context count.
- Give careful consideration to a load balanced multi-engine deployment. More engines and a greater context capacity can enhance the system's ability to absorb utilization spikes.
- Design an infrastructure that scales; ensure each layer, including feeds, can be threaded out further.
- Pay careful attention to bucketing strategy - minimize the number of candidates retrieved during a search to those which are essential for the use case.
- Bring back only the data required to evaluate the search results. Follow the initial search (MEMSEARCH) by targeted MEMGETs rather than getting everything and then processing.
- Latency sensitive environment should not use MEMATTRALL while doing memgets e.g. memGet.setSegCodeFilter("MEMHEAD,MEMATTRALL");
- Use minscore to minimize the number of records returned on a search.
- Use unique user ID's per workload type for the ability to track workloads through TIMER logs.
- Consider isolating workloads to unique engine instances to preserve threading capacity for specific workloads.
WebSphere Application Server
Version 11 onwards, MDM runs as an enterprise OSGi bundle within WebSphere Application Server container framework. Therefore we can analyze the WebSphere run time for a performance review. WebSphere has various tools for troubleshooting performance and some of the most useful ones for performance analysis are discussed in the following links:
WebSphere on AIX: http://www-01.ibm.com/support/docview.wss?uid=swg21052641
WebSphere on Linux: http://www-01.ibm.com/support/docview.wss?uid=swg21115785
WebSphere on Windows: http://www-01.ibm.com/support/docview.wss?uid=swg21111364
Similarly to WebSphere, the operating system also provides tools for performance issues. IBM has it's own version of Linux distribution, AIX, and the following link guides us through effective tools for performance troubleshooting on AIX platform: http://www-01.ibm.com/support/docview.wss?uid=swg21222476. For other platforms, you may search online or contact the vendor.
Performance Log Manager
This is an MDM maintained tool for overall performance monitoring across all engine instances. It outputs a web report with performance statistics like latency, throughput for a particular interaction, size of result set, input and output member count, input and output row count, candidate count, bucket count, etc. This empirical data can be invaluable during a statistical performance analysis of the application. For details on how to enable and use performance log manager, refer to the information center. For v-11, the corresponding link is http://pic.dhe.ibm.com/infocenter/mdm/v11r0/topic/com.ibm.mdshs.ops.doc/topics/c_ops_performance_log_manager.html and for v-10.1, it is http://pic.dhe.ibm.com/infocenter/mdm/v10r1/topic/com.ibm.mdshs.ops.doc/topics/c_ops_performance_log_manager.html.
Product Synonym
MDS;Master Data Service;MDM;MDM SE;MDMSE;Master Data Management;IBM Infosphere Master Data Service;MDM Standard Edition;MDM Hybrid Edition;Initiate
Was this topic helpful?
Document Information
Modified date:
27 April 2022
UID
swg27042420