- Card settings performance tips
- Data size tips
- Data frequency of change tips
- Data sequence tips
- Data independency tips
- Data structure tips
- Type tree tips
- Mapping tips
- Functional map tips
- Lookup tips
- External database functions tips
- Tips for replacing adapters with functions
- Transaction management tips
- Tips to identify performance bottlenecks
- Map settings performance tips
- Downloadable resources
- Related topics
Performance tuning tips for WebSphere Transformation Extender
This article provides performance tips and recommendations for IBM® WebSphere® Transformation Extender (hereafter called WebSphere TX), regardless of the platform on which it is running. The article will help you take advantage of the flexibility and power of WebSphere TX for data transformation, while still meeting performance objectives.
The word performance refers to workloads per time period, while also considering usage of memory, disk space, and other resources. This article focuses on WebSphere TX maps as the primary area for performance tuning, and provides a variety of map-related performance tips.
Card settings performance tips
Card setting tuning parameters can be used to control card processing behavior.
Reuse WorkArea card settings
Work area files contain metadata of various input data type objects. A work area can be reused in subsequent runs for any input card by setting WorkArea to Reuse in the card settings. Whenever possible, reuse work areas, especially with large data volumes or with data requiring complex validation, because performance improvements can be significant.
Whenever possible, disable writing to backup files
Use Burst for FetchAs input card setting
Burst mode means that the map will retrieve input data in chunks specified in the FetchUnit parameter defining data units. These units correspond to the highest-level repeating object in the data stream. Burst mode minimizes the size of the workspace, which increases performance.
Data size tips
The most important tip is to know your data, which will help you determine the best way to set tuning parameters, and create type trees and maps.
Use Burst mode for large data volumes
For more information, see Card settings performance tips
Use routing for large data volumes
Routing is the splitting of output to separate files based on data values. Functions available for data routing include RUN() and PUT(). RUN() enables the map to execute a secondary map, while PUT() may not be feasible in some routing situations:
- Avoid using additional maps to route data. Instead, use PUT() with an appropriate adapter, because it yields better performance than RUN().
- If additional processing with a secondary map is required to provides flexibility in output generation, use RUN().
- Minimize the number of routings by applying good map and type tree design, which will improve execution time.
- In router maps, use HANDELIN() instead of ECHOIN(). HANDELIN() allows all ECHOIN() function, but passes pointers to data instead of echoing data, which reduces validation time.
Data frequency of change tips
For static data, reuse WorkArea whenever possible
A classic example of a map that uses static data is a lookup table. Validation steps for these sources can be bypassed, reducing overall execution times. For more information, see Card settings performance tips.
Data sequence tips
For unsorted and ungrouped data, use Identifier to facilitate data validation
Data sequence can also determine the best type and map design methodology. You need to determine whether data is sorted and grouped or not. If data is sorted and grouped, then map functions can select specific groups of data. If it is not sorted and grouped, there may be identifiers in the data on which to sort it during validation. For more information, see Type tree tips.
Data independency tips
If there is no relationship between input data records -- in other words, if each record is independent -- then here are two tips:
Use Burst mode with independent data to process data in smaller chunks
Use the RESTART attribute and REJECT function to capture invalid data
Data structure tips
If you can define an input data structure, it is bettere to deal with fixed data instead of delimited from a performance perspective. Fixed group objects require a work area of about the same size as the data itself, requiring less disk space and memory, and leading to faster processing. On the other hand, validating delimited group objects requires more checks, so it takes longer, and the work area can expand to four or five times the size of the actual data.
Type tree tips
A good type tree is a common factor in every good map. Here are some recommendations to ensure that your type trees define the data as required, while also minimizing validation time for faster performance:
Use the Identifier attribute in type tree design
The Identifier attribute marks the component that specifies the type of a data object. So Identifiers help the engine determine whether or not the data is valid without having to parse all of the data. For example, if an Identifier is the second field in a record and an error occurrs, the engine will stop processing the record as soon as it determines that the data does not match the type properties, restrictions, or component rules. This procedure prevents the processing of invalid data and reduces validation time.
Use the RESTART attribute in type tree design
The RESTART attribute is used as a recovery point during validation. A restart goes back to the last restart point and ignores invalid input objects while building the output with valid data. If RESTART is not used, the map will fail once it encounters invalid data, even if the invalid data is in the last record of a large file, which increases processing time because of the reprocessing and revalidation. Tips for using RESTART:
- Assign RESTART to a component with a series range either (1:x) or (s). Otherwise, it will have no effect on data validation.
- Put RESTART at a logical restart point in the data -- in other words, in independent data, such as in a record component. It is recommended in this case to use Burst mode.
- Use the REJECT function in conjunction with the RESTART attribute to map invalid records contained in a rejected work area.
Use Empty property in type tree design
The Empty property provides an alternative type syntax object for groups or items when they have no data content. This technique reduces the type tree processing time during validation.
Use Partition property in type tree design
Partitioning defines unordered data of different types appearing in the same place in a data stream by subdividing its objects into mutually exclusive subtypes. Partitioning facilitates data differentiation and helps in sorting data during validation. Partitioning also simplifies map rules, helps to ensure that only the required data is passed to the output, speeds processing and validation, and generally improves performance. Partitioned data is validated against the list of partitions as they are sequenced in the type tree. By defaul,t they are sorted in ascending alphabetical order, though it can be descending order or any sequence that facilitates validation. The engine will test against each partition in the order they appear in the list until it finds a match.
- For best performance, always put the most frequent partition at the top of the list, in order to eliminate unnecessary validation tests against other partitions:
Figure 1. Sorting partitions
- Partitioning could be on item by initiators, restrictions, or format, or on group by initiators, identifiers, or component rules. Using partitioning with initiators is the most efficient.
Use the Track property in type tree design
The Track property indicates whether only components with content should be tracked, or whether all components including empty ones should be tracked. You can specify either Track Content or Track Places. Use Track Content whenever available in order to reduce processing time.
Validate data content
Validating content by applying business rules prevents the processing of invalid data. type trees provide two recommended options for validating content:
- Use a restriction list to limit an item to a particular value or set of values.
- Use a component rule to specify a condition to validate a particular component.
Choose a restriction list, component rule, or mapping rule depending on your business needs:
- Use a restriction list or component rule to specify logic that affects validation.
- Use a restriction list if the value of an item is restricted to a specific set of values.
- Use a mapping rule if the logic is related to an output data object and the way that it is determined or calculated.
Define data type appropriately
- Avoid using data conversion functions by appropriately defining source data to match target data. For example, avoid using NUMBERTOTEXT() to define a source number field as Number not Text.
- Avoid using TRIMLEFT() and TRIMRIGHT() by using padding.
Ignore insignificant data
- Do not parse a field or record that will not be used in mapping. For example, when large segments of data can be ignored, create item objects that define blobs of data to reduce validation time.
Put mapping logic into type trees instead of maps whenever possible
This section provides tips to improve maps, so that they will run as efficiently as possible and help optimize performance.
Avoid repeating logic more than once in the same map
For example, do a specific field lookup only once in each map. Use the output card from the first lookup to eliminate subsequent lookups, which will significantly reduce processing time.
Consolidate data and card objects
For example, several input cards containing cross-reference information can be consolidated in one card. Similarly, you can often consolidate several output cards used to perform transformation preparations into single card to reduce unnecessary processing and improve performance.
Use scratch files and Sink versus File adapters
Scratch files are temporary working files used to break up complex logic into simple, manageable steps. Output cards used for scratch files typically use the Sink adapter as a temporary data destination for an output card, then discards the mapped data. This technique provides faster map performance due to less file I/O, reduces file resource conflicts when running multiple instances of the map, and reduces disk space usage.
- Use the File adapter during development to view intermediate output, then change to the Sink adapter.
- For processing very large files, use the File adapter with the Transaction OnSuccess option set to Create, to create a file only if data is invalid. This technique may be more efficient than using the Sink adapter, as it includes no Transaction settings to prevent on-success file creation.
Functional map tips
Eliminate unnecessary functional maps for segments occurring only once
Pass only needed arguments in functional map calls
Passing unneeded data to a functional map is an unnecessary use of system resources. The functional map will run once for every combination of its arguments.
Restrict data passed to functional maps
When only a subset of the data needs to be processed, use filtering functions such as EXTRACT() and IF() with the functional map call to pass only relevant data. This technique eliminates time and resources spent processing irrelevant data, simplifies mapping, and improves performance.
Use functional maps rather than run maps
Use functional maps whenever possible, since run maps are much slower.
Input and output format requirements can help you decide which search functions to use for optimum performance results. Lookup functions are some of the most time consuming functions except for external functions.
Filter data before search rather than after search
Speed up searches by filtering lookup data using the WHERE clause in SQL statements, or by using functional maps for repeatedly searched data.
Use the UNIQUE() function with redundant data during lookup
Use the UNIQUE() function to extract unique occurrences and prevent the processing of redundant data. It returns all unique members of a series.
Replace the LOOKUP() function with any other search function whenever possible
The search algorithm used by the LOOKUP() function is slower than other search functions because it scans the data file object-by-object, looking for the first object that matches the specified criteria.
Use SEARCHUP() or SEARCHDOWN() with search data that is sorted or can be sorted
The search algorithm used by both functions traverses data as a binary tree, taking advantage of the sorted data.
Use the SORTUP(), SORTDOWN(), or ORDER BY clause in SQL queries to ensure the sorting of lookup data
SEARCHUP() and SEARCHDOWN() are significantly faster than LOOKUP().
Use EXTRACT() versus LOOKUP()
If one object is enough, use LOOKUP(); otherwise use EXTRCT().
Use CHOOSE() instead of SEARCHUP() or SEARCHDOWN()
In cases where CHOOSE() consistently references the same type, it is faster than SEARCHUP() or SEARCHDOWN() because CHOOSE() goes directly to the specified instance in a series of objects.
Try different lookup functions
Try different lookup functions, run the Map Profiler, and then examine the runtime results to determine the optimal lookup function.
External database functions tips
A common performance pitfall is excessive use of cross-reference functions that require external lookups such as DBLOOKUP() and DBQUERY(). While all external functions add significant time to map processing, database functions are the most commonly used.
- Avoid or reduce external database queries when a functional map is invoked.
- Avoid frequent execution of the same query.
- Cache lookup data using a single query.
- Batch inserts, updates, and deletes into a single statement.
- Choose optimal cross-reference functions.
- Limit use of functions that require more processing time.
- Use temporary files to simplify later processing.
Tips for replacing adapters with functions
For example, for improved performance, instead of using a Base64 adapter in the command line of a map, use BASE64TOTEXT() or EXTTOBASE64() Base64 conversion functions in the map rules.
Transaction management tips
During map execution, transaction settings include the Source or Target Adapter, the Scope settings, and the Transaction Commitment. The Scope setting can be either commit by Map (the default setting), Card, or Burst. For Burst, the transaction is committed or rolled back after the execution of every statement, or optionally, after a specified number of statements (such as INSERT, UPDATE, DBLOOKUP, and DBQUERY statements, stored procedures, or GET functions). This technique requires more database calls per row, which slows performance, but the slowdown can be offset by specifying a large number of statements.
Tips to identify performance bottlenecks
Rather than going blindly through a map refactoring to improve performance, try to detect map bottlenecks where performance slows the most or resources are over-consumed.
Use Page settings and the dtxpage command
Page settings define how the workspace is sliced into segments in which only a subset of the work and data files are kept in memory at one time.
- PageSize defines the size of the memory segment used to store the data and work files.
- PageCount specifies the number of memory pages to use.
- Increase the value of PageSize for maps with many type references.
- Increase the value of PageCount when dealing with large amounts of data.
Use the dtxpage command to optimize the value of PageSize and PageCount to improve performance. To run the dtxpage command:
- Compile the map to generate the .mmc file.
- Install the Command Server if not already installed.
- Run the Command Server in the installation directory.
- Invoke the command using the compiled map file as a parameter:
The command will run the map with the provided input for multiple iterations, using different page settings for each one to detect the optimal settings. The output will list the results of all iterations followed by the suggested optimum values for the PageSize and PageCount. Here is an example of running dtxpage:
Figure 2. Invoking dtxpage command
Use the Map Profiler and the dtxprof command
Map Profiler is an easy-to-use configurable tool for capturing and reporting map statistics. It can provide a precise measurement of performance for every operation, and reveal needed improvements. Map Profiler is available in the Design Studio and can also be run from the Command Server. Map Profiler can capture the following statistics:
- Processing time for components, mapping rules, and functions
- Rules and functions execution count
- Type object access count
- Object nesting level
To run Map Profiler from the Design Studio:
- Select the required Profiling Mode and Setting from Window => Preferences => Transformation Extender => Map => Profiler.
- Select the map and click Enable Profiler.
- Rebuild the map and run it.
- A report is created and placed in the map folder. You can also view the report in the Profiler window.
To run Map Profiler from the Command Server:
- Compile the map to generate the .mmc file.
- Install the Command Server if not already installed.
- Run the Command Server in the installation directory.
- Invoke the command using the compiled map file name, output file name, and the required profiling options as parameters:
dtxprof -dtx MapName.mmc -o OutputFileName [-f[x]][-t[x]][-fs][-ts][-d]
Figure 3. Invoking dtxprof command
Map settings performance tips
This section provides tips to improve map performance through map settings.
Audit log settings
The audit log provides information about map execution, map settings, card settings, and data objects. The audit log is highly configurable and has performance impacts. Here are some configuration tips:
- Customize your audit logs and store them in memory when possible.
- Ensure that you generate audit logs for output only when needed, as they automatically recall output validation.
- Avoid unneeded details and data in your audit logs, especially for large type trees with many objects.
- Avoid audit logs for non-mapped objectsd.
- If an audit log is needed only for analyzing error cases, then select Execution as OnError or OnWarningOrError, according to your needs.
Trace files can slow down your map by over 50% and produce a file many times larger than the input file.
- Do not generate a detailed trace file in a resource-critical environment.
- Use audit logs instead of trace files whenever possible.
When a map is executed, the input data gets parsed and validated against the predefined input definition. Validation settings control validation behavior and the response to different validation errors. You can choose whether to validate Restrictions, Size, or Presentation by using the settings for RestrictionError, SizeError, and PresentationError. And you can specify the action to be taken for the first validation error by using the OnValidationError setting. Validation settings tips:
- Be cautious about softening your validation requirements. However, if your data is already validated by another quality control system, then softening validation can significantly improve performance.
- Consider your business requirements when selecting the OnValidationError setting. It can be a waste of time and resources to continue processing after encountering invalid data.
Workspace location and paging settings for work files
Optimal settings for workspace location and paging can increase the map processing speed. Optimal settings depend on your map and data:
- For small data volumes, use a memory-based workspace and small paging size.
- For large data volumes, use a file-based workspace and large or default paging size.
- Optimize the value for PageSize and PageCount by using the dtxpage command. For more information, see the section Page settings and dtxpage command.
This article provided a number of detailed tips for improving the performance of WebSphere TX maps. Topics included defining data in type trees; mapping rules and map input/output cards; map tuning parameters and card settings; and interfacing with external application or databases. The article also showed you how to use configurable tools to detect performance bottlenecks and perform successive iterations of design and development improvements.
- WebSphere Transformation Extender resources
- developerWorks resources