Contents


Monitoring and tuning InfoSphere Master Data Management Server, Part 1

Set goals and tune each layer of the infrastructure

Tune the MDM Server environment for high performance access to master data

Comments

Content series:

This content is part # of # in the series: Monitoring and tuning InfoSphere Master Data Management Server, Part 1

Stay tuned for additional content in this series.

This content is part of the series:Monitoring and tuning InfoSphere Master Data Management Server, Part 1

Stay tuned for additional content in this series.

Application performance of IBM InfoSphere™ Master Data Management Server (MDM Server) is influenced by a set of factors that involve solution design, implementation, and infrastructure. This series of two articles focuses primarily on the implementation and infrastructure factors that impact performance, as well as the methodology for monitoring the MDM Server application. You need to closely monitor and effectively tune the full MDM Server implementation stack for optimum performance. This series helps IT professionals and customers running IBM MDM Server to maximize performance by offering practical instructions and recommendations to effectively monitor and tune the full MDM Server software stack.

MDM Server helps your organization gain control over business information by enabling you to manage and maintain a complete and accurate view of customers' master data. MDM Server enables your organization to obtain the maximum value from the master data by centralizing multiple data domains (including party, account, and product) and by providing a comprehensive set of pre-built business services that support a full range of master data management (MDM) functionality. IBM is the first to deliver the MDM server for multiple domains. IBM also delivers industry-leading integration capabilities that reduce client risk and cost, thereby increasing revenue and enabling clients to quickly meet compliance requirements.

The MDM Server supports multiple operating systems, middleware, and databases. Like in any software application, bottlenecks in individual hardware subsystems, an incorrectly configured operating system, or a poorly tuned application and environment can cause poor performance. The proper tools can help you to diagnose these bottlenecks.

This article walks you through the major components of the MDM Server and how the components interact. You will learn the various software layers of a service and how the MDM Server responds once a service is invoked. This article prepares you to monitor, identify, and tune various layers of the MDM Server for optimum performance.

This article provides best practices in performance tuning for the following MDM Server-specific topics:

  • MDM Server monitoring and tuning
  • WebSphere monitoring and tuning
  • Database monitoring and tuning

It is recommended that you read this article through in order, however you might find some sections to be useful as immediate references.

Prerequisites

For optimum performance, the components, including the MDM server, the database server, and the WebSphere MQ server, should reside on separate systems in most cases. The primary benefit of having such a configuration is to avoid resource contention from multiple components residing on a single server. For example, running the MDM Server components on a single computer, though technically feasible, would impact the amount of throughput achieved, depending on your workload.

The components for the IBM InfoSphere MDM Version 8 used in the performance test configurations in this article are as follows:

  • An application server computer running WebSphere Application Server V6.1
  • A database server computer running DB2 V9.5.4

Understanding the big picture

When approaching the subject of application performance in a multi-tier environment, a healthy combination of process competence, product knowledge, and contingent experience with your environment is absolutely essential. Performance tuning checklists, best practices, and in-depth experience with a product are great assets, but they only go so far. To begin, set performance targets that align with your business goals, and proceed to reach them with approaches that balance long-term scalability with low costs in time and resources.

This section outlines the competencies and processes that need to be in place to successfully plan for, analyze, and tune an MDM Server implementation.

Set performance goals

When embarking on any performance tuning or testing project, it is best to set concrete goals that provide focus and an end-state to your efforts. While this might seem obvious, many tuning efforts have become needlessly expensive because focus was placed on the wrong component or the tuning was simply taken too far because a quantifiable goal had not been set. Clearly setting such a goal requires an understanding of the specific business requirements.

Your goals should be specific and rooted in the business language. For example, rather than stating a target of 30 TPS for a delta load batch run, it is often better to specify the target time for a batch window including all start-up and follow-up work along with a tolerance level for rejected records. This allows your tuning to be flexible and cost-efficient with the solution while it addresses the real business concern rather than an abstract number.

The language and terminology of your goals should be clear to everyone involved in achieving it. Even a commonly used term like transaction is incredibly ambiguous in a workload with different types of services.

Following is a list of sample concrete goals:

  • Searches by Name and Postal Code should not exceed 1500ms and should be less than 1000ms for 95% of the executions
  • Online getParty calls should not exceed 500ms for 95% of the executions at a sustain concurrency of 45 getParty calls per second
  • Delta update-maintenance transactions should complete within 3 hours assuming a maximum batch workload of 65,000 updates

Know the environment

You need a solid understanding of the components of your MDM Server solution to help you streamline any performance testing or tuning process. MDM can only run as fast as the infrastructure allows. Any CPU, I/O, or network bottlenecks can negatively impact throughput. Correlating infrastructure benchmarks and basic monitoring gives your performance test results enough context to help you pinpoint your bottleneck.

A multi-tier enterprise solution has a lot of related moving parts, and MDM server is no exception. Understanding the concurrency, response time, and resource usage at each node in the environment helps provide faster isolation of system and application bottlenecks. Figure 1 shows a logical architecture of the common components in an MDM Server solution.

Figure 1. MDM Server logical architecture
Diagram showing providers on top, services on the bottom, and                     actions connecting them to each other
Diagram showing providers on top, services on the bottom, and actions connecting them to each other

(View a larger version of Figure 1.)

Know the workload

Understanding and quantifying the workloads in your MDM Server solution enables you to identify your testing approach and problem analysis. In any given MDM Server solution, you have different workloads that can stress different aspects of your environment. Following are some common workloads:

Initial load
Putting data into MDM Server for the first time from the source systems
Delta load
Discrete updates from the source systems after the initial load
Application-specific workloads
Service mixes specific to the business scenario of each client system in real-time
Analysis or export workloads
ETL or custom exports
Suspect duplicate processing
For those SDP invocations with an asynchronous/evergreen process

Beyond isolating the workload, it is important to understand the different mix of MDM services being called within each workload. Because of the nature of each workload, different parts of the MDM Server infrastructure are stressed. For update-heavy delta loads, there might be I/O limitations on the database server. For a highly concurrent application-specific workload with a lot of queries, the WebSphere thread pools and database connection pools might need to be altered. You can break down existing workloads by employing the tools outlined in this article.

Establishing a performance tuning process

Whether your intention is to tune MDM Server to a desired service level agreement (SLA) or just to perform a test on the custom code change, establishing a documented and repeatable process is essential. Consider the following when conducting the MDM Server performance testing:

  • Each performance test needs a consistent baseline with which to compare changes. Use a single and repeatable data set for your performance test, and ensure that you always restore your MDM Server database to the point at which you established your baseline.
  • Make one change at a time, and re-establish your baseline when you have confirmed that the change will become permanent.
  • Ensure that both the application and database layers are restored back to the most recent baseline state, which is the best practices starting point plus any accepted tuning changes.
  • An MDM Server solution can have many concurrent workloads that use different entry points into the server. If you have a combination of JMS/MQ and EJB/RMI workloads, it is best to isolate them first, then tune, and then combine for a final test. In the same manner, distinctly different workloads, such as operational queries and delta load updates, should be isolated and tested separately first.

Monitoring and tuning MDM Server

MDM Server has a comprehensive solution for monitoring response times at various levels of granularity. Performance tracking can be enabled or disabled as required and can be configured for low-impact production environment monitoring to more intensive profiling investigations. While the feature is ARM 4.0 compliant (see Related topics), this article focuses on log4j output (performancemonitor.log), because it is available to all MDM Server implementations without the need for additional software.

Beginning in MDM Version 8.0, the MDM performance tracking feature has been greatly enhanced for ease of use and greater precision in monitoring response times. The recorded response time metrics are now measured in nanoseconds instead of milliseconds in earlier versions. In addition, the logging output now more clearly groups MDM subcomponent calls into their parent controller, which allows for a log that is easier to understand and parse. This feature has an understood and manageable overhead in MDM, allowing it to be used in production for problem resolution without seriously impeding the system throughput.

To learn more about MDM Server performance tracking, consult the MDM Developer's Guide included in the MDM Server product documentation (see Related topics).

Enabling MDM Server performance tracking

To make use of MDM Server performance tracking, you must explicitly enable it. Like most product features in the MDM Server, you do this by modifying the configuration and management tables, specifically the CONFIGELEMENT table. Enable MDM performance tracking at the recommended level for problem diagnosis and tuning. Listing 1 shows the command for a level 2.

Listing 1. Database commands to enable performance monitoring
UPDATE CONFIGELEMENT SET VALUE = 'true', LAST_UPDATE_DT = CURRENT_TIMESTAMP WHERE 
NAME = '/IBM/DWLCommonServices/PerformanceTracking/enabled';
UPDATE CONFIGELEMENT SET VALUE = '2', LAST_UPDATE_DT = CURRENT_TIMESTAMP WHERE 
NAME = '/IBM/DWLCommonServices/PerformanceTracking/level';
COMMIT;

Because using log4j output as a starting point for problem investigation is recommended, it helps to configure log4j optimally to get the right size and roll-over window for the performancemonitor.log. To ensure proper capture of an elusive problem or to simply get a large statistical sample, dedicate about 2 GB to the rolling performancemonitor.log. This is typically enough to capture an hour of workload on an MDM Server in complicated and high-volume implementations. Find the entries from Listing 2 in the log4j.properties file in the MDM Server properties.jar archive.

Listing 2. Configure Log4J Properties to Ensure Rotating of PM Logs
...
log4j.appender.performanceLog=org.apache.log4j.RollingFileAppender
log4j.appender.performanceLog.Encoding=UTF-8
log4j.appender.performanceLog.Threshold=ALL
log4j.appender.performanceLog.layout.ConversionPattern=%d : %m%n
log4j.appender.performanceLog.layout=org.apache.log4j.PatternLayout
log4j.appender.performanceLog.File=/opt/IBM/MDM85/MDMlogs/perflogs/performancemonitor.log
log4j.logger.com.dwl.base.performance.internal.PerformanceMonitorLog=ALL,performanceLog
log4j.appender.performanceLog.MaxBackupIndex=20
log4j.appender.performanceLog.MaxFileSize=100MB
...

After both the CONFIGELEMENT and log4j changes, a WebSphere Application Server restart for the MDM Server instance is required to enable the changes. Once you recycle the server, any transactions made to that MDM Server are logged by MDM Server performance tracking. It is also worth noting that you can change the level of performance logging. For example, you can turn it on or off dynamically using the CM console without the need to recycle the MDM Server instance.

The performancemonitor.log code in Listing 3 shows an example of a relatively simple customized composite transaction. Listing 3 has one parent transaction: maintainParty, yet it calls many MDM Server sub-components. From the last number in the CONTROLLER, you can see that this transaction took 667095210 nanoseconds (highlighted), or approximately 667 milliseconds. The majority of this transaction has been omitted, because there can be tens to hundreds of lines per transaction.

Listing 3. Sample entries in performance monitoring logs
2009-02-12 21:02:45,028 : 
706000020 maintainABCParty : addPerson_CONTROLLER @ CONTROLLER :  : 
  147123449056435909 : 667095210 :  : SUCCESS
706000020  maintainABCParty : externalValidation(int, Object, String) @ DWLControl : 
  147123449056435909 : 241234490564359621 : 75237 : Completed : SUCCESS
706000020  maintainABCParty : externalValidation(int, Object, String) @ TCRMPersonBObj : 
  147123449056435909 : 518123449056436088 : 64034 : Completed : SUCCESS
706000020  maintainABCParty : internalValidation(int, Object, String) @ TCRMPersonBObj : 
  147123449056435909 : 371234490564360022 : 1457512 : Completed : SUCCESS
706000020  maintainABCParty : searchSuspectDuplicateParties_COMPONENT @ COMPONENT : 
  147123449056435909 : 320123449056436177 : 341390435 :  : SUCCESS
706000020   maintainABCParty : searchPersonByName_COMPONENT @ COMPONENT : 
  320123449056436177 : 602123449056436296 : 339744666 :  : SUCCESS
706000020  maintainABCParty : addParty_COMPONENT @ COMPONENT : 
  147123449056435909 : 587123449056470310 : 322874434 :  : SUCCESS
...
706000020    maintainABCParty : addPartyAdminSysKey_COMPONENT @ COMPONENT : 
  746123449056470366 : 950123449056501339 : 11416919 :  : SUCCESS
706000020     maintainABCParty : internalValidation(int, Object, String) @ 
  TCRMAdminContEquivBObj : 950123449056501339 : 706123449056501410 : 60410 : 
  Completed : SUCCESS
706000020  maintainABCParty : execute(ExtensionParameters) @ FeedInitiator : 
  147123449056435909 : 324123449056502600 : 32071 :  : SUCCESS

Profiling and problem resolution

While MDM Server performance tracking has many uses, the most important is its role in profiling MDM Server workloads or individual transactions. It provides two pieces of key information: granulated response times and the sequenced breakdown of MDM Server component calls. This information makes performance tracking invaluable as a first-stop diagnostic tool for performance problem diagnosis. As such, IBM Support considers it one of the critical must-gather items for any support engagement. The IBM MDM Performance Team also uses MDM performance tracking internally to benchmark and assess the MDM Server performance.

A typical usage scenario of MDM Server performance tracking is to investigate slow response from a complicated, customized MDM Server transaction. Capturing the transaction of interest in a test environment reveals the following about the subcomponents:

  • Their names
  • The frequency of their execution
  • Their sequence
  • Their response times

With this information, you can see what large and complicated transactions are doing in the implementation, and potentially you can spot areas for improvements. If used early in the development process, you can use this information to spot inefficient MDM server inquiry levels in which more component information is being brought back than the transaction actually needs or to spot inefficient logic in customized composite transactions that result in redundant MDM Server component calls. The transaction breakdown can also pinpoint subcomponents that are taking an unusually long time, helping to pinpoint where to begin further investigation for SQL or code slowdowns.

Another typical usage scenario for MDM Server performance tracking is to profile the overall workload for the purposes of tuning and transaction optimization. A simple shell or PERL script can help produce helpful summary data about which MDM components are taking up the most time in an operational MDM workload.

The chart in Listing 4 comes from an early performance test in an MDM implementation. The chart provides a high-level perspective on transaction mix and response time in the monitored MDM server. The implementation was experiencing performance problems with poorly tuned suspect duplicate processing rules. By diving into examples of the updateParty, addPerson, and addOrganization transactions in the performance logs, it was easy to determine that the subcomponent common to all of them, searchSuspectDuplicateParties, was taking the most time in all of the aforementioned controller-level transactions. A customized search suspect rule was modified and re-tested, which cut the transaction response times in half.

Listing 4. Transaction response time (ms)
Transaction Name	    Txn Count	 AvgRespTime	     CumRespTime     %Workload
updateParty		17911		3186		57072557	37.22%
addPerson		4850		7077		34331324	22.39%
addOrganization		3353		8856		29702874	19.37%
getPerson		9629		1105		10637676	6.94%
updateContractPartyRole	20746		447		9280139		6.05%
getOrganization		8291		889		737984		4.81%
getPartyRelationship	12299		108		1323786		0.86%

Configuring logging levels

The MDM Server logging configuration is one of the most obvious and easiest ways to optimize application performance. Frequently during development, the MDM Server Customer.log is set to log4j's DEBUG level for tracing purposes. While this is fine for most testing environments, MDM Server's logging output is substantial, and this level is not appropriate for a production workload or for performance testing. Therefore, the central logger should be set to ERROR, as shown in Listing 5.

Listing 5. Setting logging level
...
log4j.appender.file=org.apache.log4j.RollingFileAppender
log4j.appender.file.Encoding=UTF-8
log4j.appender.file.Threshold=ERROR
log4j.appender.file.layout.ConversionPattern=%d %-5p %3x - %m%n
log4j.appender.file.layout=org.apache.log4j.PatternLayout
log4j.appender.file.File=/appl/spool/tssi/BP/logs/Customer.log
# optional settings
log4j.appender.file.MaxBackupIndex=10
log4j.appender.file.MaxFileSize=10MB

# The next line controls the level of output for the root logger
# [ALL, DEBUG, INFO, WARN, ERROR, FATAL, OFF]
log4j.rootLogger=ERROR, file, stdout
...

It is also worth reviewing any extra diagnostic logging that might have been added for customized transactions or extensions. The best practice is to log this extra diagnostic information to the DEBUG level in the Customer.log rather than any other log or logging level. Frequently Java® developers write large portions of data (such as XML) to System.out, and this goes unseen until it is deployed to production and the application server logs are filled with debugging information. Not only is this bad for performance, but it also hinders problem resolution by filling up the log with superfluous information.

Using TAIL

Transaction audit information logging (TAIL) records the information about transactions executed in MDM Server. In order for the service to provide this information on demand, any MDM services that add or modify data will have an additional overhead when TAIL is enabled. While this feature is helpful for adhering to accounting practices and standards, it should be given due diligence in capacity planning and performance testing. Consider the following points regarding performance when you enable the TAIL:

  • You can configure the transactions that must be audited and the actions (or internal transactions) within each transaction that must be audited. Therefore, configure only the transactions and actions you require. Before MDM Server Version 9.0, at least one action per transaction must be configured, however beginning with MDM Server Version 9.0, you have the capability to configure only transactions without any actions.
  • The percentage of MDM services that retrieve data from the TAIL tables are often low in a transactional workload, and those services are not typically a performance concern. The only concern is any MDM transactions that modify or add data, because they will trigger TAIL.
  • TAIL increases usage on the following MDM Server database tables: TRANSACTIONLOG, INTERNALLOG, INTERNALLOGTXNKEY, INTERNALTXNKEY, EXTERNALLOGTXNKEY (version 9.0), and EXTERNALTXNKEY (Version 9.0).
  • If your main objective for using TAIL is to record system usage (for reporting purposes), consider using the service activity monitor instead.

Using inquiry levels

Key inquiry services that support inquiry levels include GetParty, GetContract, GetPartyWithContracts, and GetProductInstance. An inquiry level dictates what type of business objects to return in the response, and it enables you to adapt the response to the needs of the consumer and to be performance-sensitive by retrieving only the consumer needs. A set number of inquiry levels for the various business objects are provided out of the box. A mechanism is provided for you to define your own custom inquiry levels by configuring metadata.

As of MDM Server 9.0.1, you have the capability to define your own SELECT statement associated to an inquiry level. When doing so, the inquiry services use it as opposed to the out-of-the-box internal methods. The number of roundtrips to the database to fetch the data can be greatly reduced. The administration services for creating new inquiry levels provides the option of generating an associated SELECT statement that is then managed within the metadata. The SELECT statement can be tailored based on your needs, for example:

  • By default the SELECT statement includes selecting all columns from all required tables. Columns you are not using as part of the implementation or columns not required for a particular inquiry level can be removed from the statement.
  • The WHERE clause can be tailored based on your user profile. For example, if your implementation has a strict rule about managing only one name per person, you can tailor the default WHERE clause to more efficiently join to the PERSONNAME table.
  • The SELECT statement can get more granular to retrieve only the data you need. For example, if a consumer requires only Home Addresses as part of a query, the WHERE clause of the particular inquiry level can include a discriminator that selects only address usage types of home address.

Using smart inquiries

Smart inquiries is an MDM Server feature that enables an implementer to turn off particular sets of MDM data model queries that are not in use. This simple change avoids the execution of extraneous queries, significantly reducing transaction response time and increasing TPS in workloads in almost all transaction workload scenarios. Because this feature can modify the functionality of the product, and because your use of the MDM Server data model can evolve over time, use careful consideration when implementing this feature. Consider the following points for using smart inquiries:

  • Because most MDM services include embedded inquiries, this feature often has benefits in many MDM services.
  • The performance benefit is contingent on how many of your transactions are querying the unused portions of the MDM Server data model, how often they are queried, and how many unused portions you turn off.
  • This feature reduces the response time and increases the throughput in those affected transactions.
  • Optimizations typically result in a significant savings in the CPU usage on the application and database layers, as well as an overall load reduction on MDM database server, that is proportional to the number of queries eliminated from transactions by enabling this feature.
  • Smart inquiries settings are not in effect when the pluggable SELECT statement is used with the customizable inquiry level feature.

Using history

MDM Server can track modifications to data using the history feature. This feature uses database triggers to modify MDM Server's history tables, which are essentially mirrors of the main MDM data tables used for storing the relevant history events. The history tables are queried when inquiry services such as GetParty are used with an as-of date in the DWLControl header (referred to as point-in-time queries). For most MDM Server operational workloads, the queries to the history tables make up a very small portion of the overall workload and require little consideration for performance.

Each transaction that modifies data in MDM Server runs the relevant triggers as part of the add or update work. Therefore, in heavy insert and update workloads, the history tables also should be considered along with the core operational tables in database tuning and I/O planning. Consider, if possible, dropping the database triggers for history tables that are not required for audit purposes. If you do this, the point-in-time queries for history tables with dropped database triggers are not supported because there will be no data in those tables for the inquiry services to retrieve.

Note that the growth of the history table should be planned for when doing initial capacity planning. One strategy that can be employed when using DB2 on z/OS is to partition the history tables by date (range). When a partition has reached its required age based on audit requirements, it can then be dropped. Range partitioning is also available when using DB2 for Linux®, UNIX®, and Windows® database software. Use range partitioning to enable rapid deletion of ranges of data (called roll-out). For example, if you need to roll-out data by month, range partitioning by month is a reasonable strategy.

Types of data extensions

There are two supported approaches to extending out-of-the-box tables with additional attributes. The first approach is by creating a side-table extension that hosts the extended attributes with a one-to-one relationship back to the extended table. The second approach is an in-line extension by altering the out-of-the-box table. If you have a high volume of add or update processing (also known as delta processing), consider in-line extensions. The MDM workbench component provides tooling to create the pluggable persistence routines so that only single inserts and updates are made to those tables for inserting or updating a business object. See the MDM Server Developer's Guide (see Related topics) for more information on this approach.

Using in-line extensions with pluggable persistence is a foundational performance consideration, because it has positive impacts on other activities, such as reducing trigger activity to the history tables and fewer tables to join for querying data when the pluggable SELECT statements associated to inquiry levels are used.

MaintainParty composite service

It is common to employ a MaintainParty composite service to synchronize data to MDM Server (also known as delta processing). There are three main steps in such a composite service:

  1. Find the party and query its details.
  2. Process business rules that compare the incoming update details to the party retrieved from the database and determine how to apply the update.
  3. Update the party (by invoking the UpdateParty service).

The following are some considerations from a performance perspective:

  • In the first step, query only the data you need. For example, if only an address change is being processed by the composite, there is usually no need to retrieve the entire party. This is where inquiry levels can be leveraged.
  • In the third step, update only the data required. If additional data is provided to UpdateParty that you know is redundant and does not require updating, it forces UpdateParty to re-discover this (by querying and processing that data).
  • In the integration layer, try to process all changes to a single party as part of a single MaintainParty service invocation. For example, if a party has a changed address and new contact method, it is more efficient to process that as a single MaintainParty service invocation than to process two separate invocations. Also, within the MaintainParty execution, it is most efficient to process both as part of a single UpdateParty call, especially if suspect duplicate processing is enabled, because then that process is executed only once in the scope of the transaction.

Using notifications

In order to facilitate integration with other applications in the surrounding enterprise, the MDM Server enables you to create notifications. While the notification system is functionally pliable, the most common form of notification ends up on the JMS topics configured during MDM Server installation. While a best practices recommendation for implementing notifications is beyond the scope of this article, it is worth noting that MDM Server implementations should plan for capacity to support the topics that are using the JMS system queues to avoid critical bottlenecks. Generally, consider the following when using notifications:

The system queue depth or the maximum size
Depending on your messaging management system (such as WebSphere MQ or WebSphere System Integration Bus) and configuration, a full queue could result in discarded notifications or in blocking transactions that are waiting to put a message on the queue.
The size of your notification payload
Depending on your message management system configuration, large messages can be rejected outright or can cause system degradation as the queue struggles to manage thousands of larger messages. This is especially important for high-availability queues that get written to a file system or database. If the notification messages are consistently large (over 1MB in payload), review the payload and consider sending primary keys to external systems so that they query the information they need from the MDM Server directly. Very large messages can also cause high memory usage and garbage collection delays in the MDM Server JVM.
Concurrency in your architecture
If your notification processing is falling behind, consider tuning the WebSphere server to allow for more threads and connections to your queue system by setting a proper connection pool for the Topic Connection Factory and the associated session pool. Also consider the processing capabilities of the applications consuming the notification messages, and ensure that they do not become bottlenecks. The specifics for adjusting concurrency depend on your messaging system, so consult the appropriate product documentation.

Using suspect duplicate processing

Suspect duplicate processing (SDP) is an essential part of any MDM Server solution. When planning for the functional concerns with this feature, serious consideration should be given to understanding the performance impact of your implementation choices. The best practices for using the SDP in your MDM Server solution is beyond the scope of this article. Instead this article outlines the logical process and lists the major considerations with respect to application performance.

While there are many options and capabilities for the SDP in MDM Server, the goal is generally to identify potentially duplicate parties and either collapse them into one party or record the match and defer for later action. Once the SDP is invoked in MDM Server, it proceeds to do the following:

  • Searches for potential candidates based on available critical data
  • Retrieves the party information relevant to matching from all candidate parties
  • Matches the original party to the candidates based on the rules and criteria
  • Takes the action of recording suspects and/or collapsing the party

This process becomes slightly more complicated with data standardization, probabilistic matching, or customized search and match rules. However, in all cases, consider the following points:

  • The cost and overhead of the SDP varies greatly, depending on the specific implementation. However, it frequently takes longer than the actual add or update service that invokes it. Even though the MDM services are quite fast, it is important to understand that the SDP is a significant part of your overall workload and therefore should be given appropriate attention during data analysis, development, and performance testing phases of your MDM Server implementation.
  • The response time for SDP depends primarily on the efficiency of the candidate search and the resulting number of candidates the search returns. The candidate search is considered efficient (both from a functional and performance perspective) if the parties returned are successfully matched as relevant suspects that result in a collapse or are of relative interest in data stewardship. An inefficient search returns candidate parties that are too numerous or not similar enough to be relevant. For example, if a lot of parties contain dummy data for a critical data item such as 0000000 for a social security identifier, the resulting search might return many potential candidates that would probably not be relevant suspects. Because all of the returned candidates need to have their party information queried, many inefficient searches can significantly add overhead to your MDM Server workload. This has an impact on the overall response time and the data quality of your implementation.
  • Whenever processing a continuous load of updates, such as a delta workload, execute SDP asynchronously or in a specific batch window using the MDM Server evergreening process. There are many benefits to both data quality and performance with this approach. By separating out the SDP work from the regular MDM add and update services, fewer locks are held per transaction, which reduces the concurrency overhead on a busy database that results in faster completion times for both the add/update workload and for the SDP workload. Thus, the lock contention is significantly reduced in your database, and your application server has shorter and more discrete units of work, which reduces CPU and JVM memory overhead.

Tuning and monitoring WebSphere for MDM Server

Optimal WebSphere Application Server software settings vary depending on your implementation and your workload. These settings come from proactive monitoring and continuous tunings as applications and user behaviors change. Consider the following fundamental tuning parameters for each MDM WebSphere Application Server instance:

  • Start the minimum heap size of 512 MB and maximum heap size of 1024 MB for an MDM application server instance. If you are using a 64-bit WebSphere Application Server instance, you can increase it to a much higher size. Normally heap size does not need to go beyond 2 GB for most MDM implementations, therefore heap size should not be the reason for using 64-bit in this case. Also, be aware that using a 64-bit WebSphere Application Server instance has an overhead of 5% to 15% in terms of transaction throughput, compared with using 32-bit, depending on your workload. Reported overhead was measured from running a representative MDM Server workload with WebSphere Versions 7.0 and 6.1. The overhead is lower in WebSphere Version 7 than in WebSphere Version 6.
  • Ensure that the Object Request Broker (ORB) thread pool is equal to the maximum amount of concurrent direct calls to the MDM ServiceController stateless session bean (RMI transactions) expected for each MDM Server instance. The maximum thread pool limit is your theoretical maximum number of concurrent users.
  • Ensure that the Web container thread pool is tuned to a high enough level to accommodate Web service requests, if applicable for your application. In most cases, this thread pool does not need to be adjusted because Web container threads are non-blocking and can handle multiple MDM service requests concurrently.
  • Start the Enterprise JavaBeans (EJB) cache size at 3500. Adjust as needed to reach the optimal setting for MDM Server.
  • On the Java Database Connectivity (JDBC) data source used for MDM Server, start the Prepared Statement cache size at 300, which can improve transaction response time by 5% to 15%.
  • Reduce input/output activities by using the default logging level (or lower) that WebSphere sets.

See Related topics for reference information.

Generating JVM heap dumps

Generate JVM heap dumps manually using one of the following methods:

IBM_HEAPDUMP=true;
Generates heap dump in .phd format used for newer memory tools
IBM_JAVA_HEAPDUMP_TEXT=true
Generates classic heap dump format for older tools
SIGQUIT (kill -3 on UNIX; Ctrl+Break on Windows)

You can also set up WebSphere Version 6 to automate heap dump generation. This enables the best method to analyze memory leak problems on AIX, Linux, and Windows operating systems. Manually generating heap dumps at appropriate times can be difficult. To help you analyze memory leak problems when memory leak detection occurs, some automated heap dump generation support is available. This functionality is available only for IBM Software Development Kit on AIX, Linux, and Windows operating systems.

Most memory leak analysis tools perform some form of difference-evaluation on two heap dumps. Upon detection of a suspicious memory situation, two heap dumps are automatically generated at appropriate times. The general idea is to take an initial heap dump as soon as problem detection occurs. Monitor the memory usage and take another heap dump when you determine that enough memory is leaked so that you can compare the heap dumps to find the source of the leak. Complete the following steps for WebSphere Version 6:

  1. Click Servers > Application servers in the administrative console navigation tree.
  2. Click server_name >Performance and Diagnostic Advisor Configuration.
  3. Click the Runtime tab.
  4. Select the Enable automatic heap dump collection check box.
  5. Click OK.

You can then use HeapRoots, HeapAnalyzer, or IBM Rational Application Developer to analyze the generated heap dumps (see Related topics for more details on each of these tools).

Generating verbose GC

Verbose GC output helps tune and debug many performance issues, including the following:

  • Finding optimal heap size
  • Finding optimal gc policy
  • Determining potential memory exhaustion and leak causes

You can use the IBM Pattern Modeling and Analysis Tool (PMAT) for Java Garbage Collector (see Related topics) to view the verbosegc output. To enable verbosegc on WebSphere Version 6, complete the following steps:

  1. In the Administrative Console, expand Servers and then click on Application Servers.
  2. Click on the server that is encountering the OutOfMemory condition.
  3. On the Configuration tab, under Server Infrastructure, expand Java and Process Management, and click Process Definition.
  4. Under the Additional Properties section, click Java Virtual Machine.
  5. Click the Verbose garbage collection check box.
  6. Click Apply.
  7. At the top of the Administrative Client, click Save to apply changes to the master configuration.
  8. Stop and restart the Application Server.
  9. The verbose garbage collection output is written to either native_stderr.log or native_stdout.log for the Application Server, depending on the SDK operating system. For AIX, Microsoft Windows, or Linux, the output is in native_stderr.log. For Solaris® or HP-UX®, the output is in native_stdout.log.

Using PMI metrics

The following information provides the metrics for the WebSphere - PMI (Performance Measurement Infrastructure) data type and for the WebSphere Version 6 Application Servers it supports. The tables provide information for the following modules:

  • Database connection pools
  • Enterprise Java beans
  • JCA connection pools
  • JTA transactions
  • JVM/systems
  • ORB detail/interceptor
  • JCA connection pools
  • Web applications
  • Session manager
  • Thread pools

Not all modules need to be monitored, nor are they all implemented in MDM Server. Tables 1-4 show the important key PMI metrics that you should be monitoring for tuning and how they can be used for troubleshooting your MDM Server performance problems.

Table 1. Database connection pool metrics
PropertyDescription
Avg. waiting threads Threads waiting for a connection to the database. This number should be near 0 for optimal performance.
Avg. wait time Related to the number above. If no threads are waiting, this number should be 0.
Avg. pool size Connection pool size actually used if overflow is enabled. This number closely matches the realistic number of connections at a minimum.
Table 2. JVM metrics
PropertyDescription
Free memory (JVM)Free heap. If number is near 0, consider increasing the heap size.
Total memoryMaximum heap size
Free memory (system)What is available from the operating system point of view. This is an important indicator whether the operating system requires more RAM.
Avg. CPU usageOverall CPU usage in the system. Ideally CPU should be the first bottleneck, and if you cannot increase CPU usage, there might be other bottlenecks, such as disk input/output, network input/output, or memory.
Table 3. Web application metrics
PropertyDescription
Response timeTime in milliseconds taken for a transaction to complete. Depending on the transaction, it may or may not be acceptable.
# of errorsFailed transactions can suggest connection timeouts, application errors/exceptions, or other environment failures.
Total requestsRequests received by WebSphere, which are a good indication of whether traffic is coming in as expected.
Concurrent requestsParallel requests given any time, which is an indication of the number of online users at a time.
Table 4. Session manager metrics
PropertyDescription
Created sessionsHistoric count of sessions in a live JVM.
Invalidated sessionsHistoric count of sessions that have expired or timed out.
Live sessionsCurrent sessions that have not expired.
Session lifetimeAverage time a session lived. This is a good optimal number to set as a session time-out value.
Active sessionsSessions that carry transactions.
Table 5. Thread pool metrics
PropertyDescription
Thread createsNumber of new threads created in various thread pools: web container, default, and ORB.
Thread destroysNumber of threads destroyed,
Active threadsCurrent active threads that perform tasks.
Pool sizeThread pool size that allows reuse of threads in various WebSphere thread pools.
% time max in useLow % means threads finish tasks quickly but have to wait out the inactivity before they are reclaimed. You should shorten the time-out value for inactivity to put the threads back to the pool sooner.

Troubleshooting

This section describes an actual case study that describes how the WebSphere monitoring tools in this article can be used to troubleshoot a heap exhaustion problem.

The original claim and compliant is: On this production system, there are object requests for about 170 to 180 MB, which causes the heap size to increase rapidly and causes the system to fail. Once the problem was truly identified as heap exhaustion, the following steps led to the identification of the cause.

  1. Validate from the WebSphere application logs that a heap dump has occurred. Listing 6 shows an example of the native.log file.
Listing 6. Sample native.log entry showing a heap dump occurs
[Thu Jul 23 04:08:47 2009] JVMDG217: Dump Handler is Processing Signal 11
- Please Wait. Explanation: A signal has been raised and is being processed 
by the dump handler. System action: At this point, depending upon the 
options that have been set, Javadump, core dump, and CEEDUMP (z/OS only) 
can be taken. This message is for information only and does not indicate 
a further failure. User response: None.

[Thu Jul 23 04:08:47 2009] JVMDBG001: malloc failed to allocate 2621440 
bytes, time: Thu Jul 23 04:08:48 2009 Explanation: The system malloc 
function has returned NULL System action: None for this message. The 
JVM might issue a further message. User response: Increase the available 
native storage.  [Thu Jul 23 04:08:47 2009] JVMDBG001: malloc failed 
to allocate 2097152 bytes, time: Thu Jul 23 04:08:48 2009 Explanation: 
The system malloc function has returned NULL System action: None for 
this message. The JVM might issue a further message.

User response: Increase the available native storage.
  1. Gather verbosegc and heap dumps at a regular interval. To identify heap growth, take several heap dumps, including one from the beginning of the test (when no heap overflow is occurring) to when heap overflow occurs. The verbosegc output can be viewed with the PMAT tool, and it shows heap usage over time. Figure 2 shows that for the case study, heap usage spikes occurred during the test.
Figure 2. Sample verbosegc analysis output
Screen cap:  verbose GC analyzer graph showing GC activities with about six                     usage spikes
Screen cap: verbose GC analyzer graph showing GC activities with about six usage spikes

Similar analyses of heap dumps taken at non-peak times show no spikes in the JVM usage, and they remained at a maximum of 400 MB. But, when the heap usage was very high, HeapAnalyzer showed that TCRMContractBObj used close to 1GB of heap, as shown in Figure 3.

Figure 3. Sample heap dump analysis
Screen cap:  Heap dump showing over 40,000 arrays for TCRMContractBObj
Screen cap: Heap dump showing over 40,000 arrays for TCRMContractBObj

In the tree view of the heap in Figure 2, there are 40,252 objects named TCRMContractBObj, each taking up close to 30 KB of memory in heap. The cumulative affect is 909 MB of heap usage. You can surmise that this is the root cause of the heap overflow.

  1. Review the transaction to see that a single getFSParty call inside a business proxy actually queries a particular party with well over 40,000 contract objects. The recommendation is to implement more sophisticated retrieval logic for elite clients with large number of contracts. That resolved the issue permanently.

Tuning the JVM

There is no such a thing as the best setting for MDM Server. For WebSphere Version 6.1 (used for MDM 8.0 and later), Table 6 gives the recommended start values for AIX. You should adjust them accordingly.

Table 6. Start values for tuning the JVM
Tuning parameterValue recommended for AIX
JVM initial heap size (MB)512
JVM maximum heap size (MB)1024
GC policy recommendations-Xgcpolicy:gencon if SMP systems are used (as in most of the cases)
Minimum active threads (in pool)70 threads
Maximum active threads (in pool)70 threads
Allow threads allocated beyond maximumNo
Thread inactivity timeout100 seconds
Maximum in-memory session count1000
Session timeout10 minutes
JDBC connection pool connection timeout10000 seconds
JDBC connection pool min connections30 : need to adjust database server settings accordingly
JDBC connection pool max connections100 : need to adjust database server settings accordingly
JDBC connection pool connection timeout10000 seconds
RC4 and MD5 encryptionDisable if WebSphere Application Servers are deployed in a secure environment

Following are some notes about some of the start values:

Setting the JVM heap size larger than 512MB
For the best and most consistent throughput, set the (-Xms) starting minimum and the (-Xmx) maximum to be the same size. Also, remember that the value for the JVM heap size is directly related to the amount of physical memory for the system. Never set the JVM heap size larger than the physical memory on the system to avoid the disk input/output caused by swapping.
Session timeout 10 minutes
The default value of session timeout is 30 minutes. Reducing this value to a lower number can help reduce memory consumption requirements, allowing a higher user load to be sustained for longer periods of time. Reducing the value too much can interfere with the user experience. You need to determine the right level based on your end-user requirements: if users mostly have quick tasks to finish, set this value low; if users have long tasks to finish, set this value higher.
Class garbage collection
Xgcpolicy:gencon handles short-lived objects differently than objects that are long-lived. Many MDM workloads generate many short-lived objects, which have shorter pause times with this policy. This parameter could allow approximately a 10% increase in throughput for many MDM workloads.
Servlet engine thread pool size 70
For the use case, 70 was used for both the minimum and maximum settings. Ideally, set this value and monitor the results using PMI. Increase this value if all the servlet threads are busy most of the time.
JDBC connection pool max
Depending on how many concurrent users are expected and on how many JVMs are used as the MDM application server, a data source connection pool is used to buffer the user load and provide efficiency in establishing database connections for queries. The sum of all of max connections for the JVMs should not exceed that of the database engine, which MAXAPPLS in DB2 UDB specifies.
Disable encryption
Communications between EJBs can be configured to use SSL, but they are not required if the MDM application server already is in a secure environment (as most are). Not using encryption and decryption can save about 15% CPU processing power.

Conclusion

This article highlighted the importance of the following:

  • Understanding the big picture
  • Setting up clear performance goals
  • Knowing your environment
  • Establishing a repeatable and consistent performance testing process.

Important guidelines describe how to effectively monitor, tune, and optimize MDM Server and the WebSphere Application layer for optimal performance, with brief discussions around best practices for a list of key areas and features that affect performance in these two layers. Part 2 of the series describes the lower parts of the stack, covering guidelines and recommendations on how to effectively monitor and tune the DB2 layer. Part 2 also presents tools commonly used to monitor operating system-level resources.

Acknowledgments

We would like to thank David Borean, Lena Woolf, Bill Xu, Steve Reese, and Berni Schiefer for their input and suggestions for this article series.


Downloadable resources


Related topics


Comments

Sign in or register to add and subscribe to comments.

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management, WebSphere
ArticleID=551837
ArticleTitle=Monitoring and tuning InfoSphere Master Data Management Server, Part 1: Set goals and tune each layer of the infrastructure
publish-date=10222010