You can use WebSphere Adapter for Flat Files to read data from a file in the local file system, use it in an application on IBM® Business Process Manager or WebSphere Enterprise Service Bus, and send it back to the local file system. You can also use the adapter to poll a directory in the local file system for new files and send these files to an application for processing. This article shows you how to increase the performance of the adapter, which decreases over a period of time as it processes large number of files (irrespective of size). It also addresses issues that you may face while configuring the adapter and its features.
You need the following software to use the instructions in this article:
- IBM Integration Designer V8.0
- IBM WebSphere Adapter for Flat Files V188.8.131.52 or above
- IBM Business Process Manager V7.5.1 or above
Configuring WebSphere Adapter for Flat Files to improve performance
This article describes some possible scenarios that describe the features of WebSphere Adapter for Flat Files. These scenarios also describe the benefits of using those features and how to configure them to obtain maximum performance.
The following features are discussed:
- Configuring the Poll Period and Poll Quantity parameters
- Configuring WebSphere Adapter for Flat Files in a cluster environment
- Using the unordered polling delivery of events
- Using the SplitBySize and SplitByDelimiter features
- Using the cyclic mode to poll for modified content
Configuring the Poll Period and Poll Quantity parameters
The two most important configuration parameters for WebSphere Adapters are Poll Period and Poll Quantity. You can modify the settings for these parameters in the Activation Specification Properties of the adapter application.
Note: The settings for these parameters affect only the performance of the Inbound polling.
- Poll Period (pollPeriod): This parameter specifies the amount of time (in milliseconds) between polling actions.
- Poll Quantity (pollQuantity): This parameter specifies the maximum number of events that need to be processed during a polling action.
The Poll Period and Poll Quantity parameters control the rate and amount of work that an adapter processes. The combination of these parameters regulate the number of transactions that are processed first by the adapter, and then by the broker (for example, Business Process Management). These parameters influence the performance of the entire solution, not just the adapter.
When you set non-optimal values for these parameters, it can result in the following:
- Low system throughput (if the Poll Period is too long or the Poll Quantity is too low)
- Excessive memory usage and potential OutOfMemory exceptions (if the Poll Period is too low or the Poll Quantity is too high)
Since both these conditions dramatically impact the overall system performance, appropriate settings for the Poll Period and the Poll Quantity are critical. You need to explicitly configure these parameters to support the level of throughput that a solution is designed to handle.
If the size of the objects is small (smaller heap memory requirements for objects in-flight) and throughput is essential, then the adapter does not create a bottleneck by over-polling.
If the size of the business objects is large (bigger heap memory requirements for objects in-flight), then the criteria is to limit the number of objects at any moment to avoid out-of-memory conditions.
We recommend that you configure the Poll Period and Poll Quantity parameters to enable events to be retrieved and processed at a level that matches the peak throughput of the overall solution. The following example discusses this in detail.
If the peak throughput rate of a solution is 20 events per second, and the events are continuously available to the adapter for processing, you must set the Poll Period parameter to a smaller value, for example, 10 milliseconds; and set the Poll Quantity parameter to 20 milliseconds. This supports the required peak throughput while continuing to support events with a smaller value. The Poll Period parameter ensures a minimal delay between poll cycles.
The following factors may need to be adjusted for the Poll Quantity and Poll Period parameters - the size of the object being processed. For larger objects, a good rule of thumb is to use a lower Poll Quantity and a longer Poll Period. This does not generally apply for relatively small objects (100 KB or less). However, for larger objects, it is important to minimize the number of objects held concurrently in the adapter process (in-flight) to avoid potential OutOfMemory exceptions.
In the example above, if the object size is 1 MB and the throughput rate of
the solution is 2 events per second, the appropriate settings are
pollPeriod = 100 milliseconds and
pollQuantity = 2.
Note: You can further adjust the poll period by calculating the time it takes to process the pollQuantity events and adjusting the poll period accordingly.
Configuring WebSphere Adapter for Flat Files in a cluster environment
Clusters are groups of servers that are managed together to balance workloads and to provide high availability and scalability. You can configure WebSphere Adapter for Flat Files in a cluster environment either in "HA Active-Active" mode or "HA Active-Passive" mode.
HA Active-Active is useful when you need to process a large number of
business objects by WebSphere Adapter for Flat Files. Because of the
performance impact, HA Active-Active allows you to process heavy files
(with the splitbySize and splitbyDelimiter functionality). In HA
Active-Active mode, multiple instances of the adapter can run parallel to
process the business object (BO) present in the event files and to deliver
to the endpoint. Based on the poll quantity and poll frequency, the first
<is this polling instance?>, picks
up the first set of BOs and starts processing it. However, the other
consecutive instances simultaneously pick the next set of BOs to process.
This increases the efficiency as well as the processing performance of the
Note: You can only configure HA Active-Active for an inbound operation.
If you configure the adapter to the HA Active-Active mode, it impacts the performance when there are large number of BOs that need to be processed. In the HA Active-Passive mode, only a single instance of the adapter processes the BOs present in the event files at a time. Other instances are processed only when the first instance is completely processed. You can configure HA Active-Passive for both outbound and inbound operations.
In order to configure the adapter for HA Active-Active, there are some prerequisites that are mandatory.
You need to:
- Set the type of delivery attribute to be "Unordered". HA Active-Active does not support an Ordered type of delivery.
- Configure all the Event persistence properties.
- Specify the type of sorting as "No Sort". HA Active-Active does not support file sorting while retrieving files.
- Specify the time interval for the adapter to process the fetched events.
For instructions to configure HA Active-Active, see WebSphere Adapter for Flat Files 184.108.40.206 QSS Quick Start Scenarios, Tutorial #9.
To deploy and configure WebSphere Adapter for Flat Files in a clustered environment, refer to Deploying and configuring WebSphere Adapters in a clustered environment.
Using the unordered delivery of events
Delivery of events specifies the order in which events are delivered by the adapter to the endpoint. The type of delivery is either ordered or unordered.
In an unordered type of delivery, multiple threads run simultaneously to deliver the events to the endpoint, which boosts the performance of WebSphere Adapter for Flat Files. However, in an ordered type of delivery, only a single thread runs to deliver the events.
Note: When a sequencing of file processing is required, do not use the unordered type of delivery.
To set the type of delivery to unordered, select the UNORDERED from the drop-down menu, as shown in the Figure 1. Here, the adapter delivers all the events to the endpoint at once.
Figure 1. Unordered type of delivery
Note: The default type of delivery selected is "Ordered".
Using the SplitBySize and SplitByDelimiter features
WebSphere Adapter for Flat Files uses the file splitting feature to divide a large file into smaller chunks, which are then retrieved separately. The file content is split according to the SplittingFunctionClassName and SplitCriteria properties defined in the interaction specification.
The splitting feature is available for inbound and some specific features for outbound (Create, Retrieve, Append, and Overwrite).
Splitting the file into smaller chunks can have a significant performance improvement, especially when you use heavy event files. WebSphere Adapter for Flat Files splits these events files into small chunks, process and deliver to the endpoint, thereby increasing efficiency.
Consider using the splitting feature when the adapter has to process heavy or a large number of files. During inbound polling, the splitting by size or delimiter feature splits the event file based on the size or delimiter specified during polling, and sends the smaller chunk as individual events to the endpoint.
During the outbound operation, the splitting by size or delimiter feature enables the adapters to split the event file based on the size or delimiter specified. The operation for create, append, or retrieve overwrites the file into smaller chunks.
Depending upon the type of content in the file, you can split the file by delimiter or by size:
- When the content of the business object has a definite structure, for example, if it contains elements such as name, address, and city, the file is split by delimiter optionally.
- When the content of the business object contains unstructured data, such as plain text or binary files, the file is split by size.
Note: By default, the adapter splits the files by size.
You can select the structure of the business object to use either SplitByDelimiter or SplitBySize.
Specify the SplittingFunctionClassName and SplitCriteria properties during the Enterprise Metadata Discovery (EMD), as shown in Figure 2 (for SplitBySize) and Figure 3 (for SplitByDelimiter).
Figure 2. Configuring SplitBySize
Note: By default, the value set for "Split criteria to split file content" for SplitBySize is 0.
Figure 3. Configuring SplitByDelimiter
Using the cyclic mode to poll for modified content
In the cyclic mode, the adapter polls the event directory for files on a continuous basis for an inbound operation and checks for possible updates for each event file. If the file in the event directory is found with updates (that is, if the last modified time of the file is different from the last modified time), then the adapter notifies these updates to the endpoint application in the following methods:
Sending only the appended content
When there is an update available for a file, then the adapter sends the newly appended content as a BO to the endpoint application.
Note: You can select this option if you are sure that only new content is added as part of the update to the existing event file.
Sending the complete file content
When there is an update available for a file, then the adapter sends the complete content of the file as a BO to the endpoint application.
Note: As part of the update to the file, the file content can be completely modified, or some parts of the existing content might have been deleted or modified. Since the adapter cannot track such changes, we recommend that you use the "Send the Complete File Content" option for the cyclic mode.
Earlier, when the customers made any changes in the event files, they had to rename the file, put it in the event directory again, then only the adapter starts processing them. However, by using the cyclic mode property, you do not need to perform the same process again. The adapter itself takes care of all files (under the Event directory) that are modified. It either sends only the appended contents, or sends the whole file content at the endpoint.
To configure the cyclic mode, select the check box Poll event files for modified content property from the EMD, as shown in Figure 4.
Figure 4. Configuring the cyclic mode
Note: To send only the appended content, select the check box that specifies Include only the newly appended content. If you do not specify this option, the adapter sends the whole content present inside the file, including the appended content.
When the Cyclic Mode feature is enabled, the following features are disabled:
- The "File Pass by Reference" property is disabled, as both the Cyclic Mode and File Pass by Reference feature cannot co-exist.
- The "Time interval for polling unchanged files" property is disabled.
Note: If you enable this feature and specify a valid archive directory path in the "Archive Directory" property, it is not valid. In the cyclic mode, the adapter does not archive or delete any files from the event directory. The adapter constantly searches for updates in the existing files.
This article described how to use the various properties and features of WebSphere Adapter for Flat Files to improve performance, which decreases over a period of time as it processes large number of files.
The authors would like to thank Vrinda Vasudevan for reviewing this article.
- WebSphere Adapters product page: Sample applications that show you how to use the various WebSphere Adapters.
- WebSphere Adapters Information Center: Product benefits, product descriptions, product news, case studies, training information, support information, and more.
- WebSphere Adapters product library: Product demos, IBM Redbooks, white papers, and more.
- IBM Redbook: WebSphere Business Integration Adapters: This IBM Redbook takes you through the full life cycle of an adapter development project - design, build, test, deployment, and implementation on multiple broker types, using both out-of-the-box and custom adapters.
- WebSphere Adapters support: A searchable database of support problems and their solutions, plus downloads, fixes, problem tracking, and more.
- WebSphere Adapters discussion forum: Participate in this technical forum on Adapters.