Tuning WebSphere Adapter for Flat Files for better performance

Learn how to increase the performance of the WebSphere® Adapter for Flat Files, which decreases over a period of time as it processes large number of files, irrespective of size. The article provides some scenarios that will address issues that come up when configuring the adapter and its features.

Share:

Abhishek Rohira (arohira1@in.ibm.com), Software Engineer, IBM

Photo of Abhishek RohiraAbhishek Rohira is a Software Engineer working on the development and support of WebSphere Adapter at the IBM India Software Lab. He has more than 3 years of experience working with various Java technologies, including JCA. He has a Bachelor's degree in Computer Science and Engineering from the Vellore Institute of Technology, India.



Mohan Siripi (rasiripi@in.ibm.com), Software Engineer, IBM

Photo of Mohan SiripiMohan R. Siripi is a Software Engineer at the IBM India Software Labs. He is currently working on development and customer support for JCA-based resource adapters. He has been working for the past 7 years with various Java technologies in WebSphere Application Server and WebSphere Adapters.



17 October 2012

Introduction


You can use WebSphere Adapter for Flat Files to read data from a file in the local file system, use it in an application on IBM® Business Process Manager or WebSphere Enterprise Service Bus, and send it back to the local file system. You can also use the adapter to poll a directory in the local file system for new files and send these files to an application for processing. This article shows you how to increase the performance of the adapter, which decreases over a period of time as it processes large number of files (irrespective of size). It also addresses issues that you may face while configuring the adapter and its features.

Prerequisites

You need the following software to use the instructions in this article:

  • IBM Integration Designer V8.0
  • IBM WebSphere Adapter for Flat Files V7.5.0.1 or above
  • IBM Business Process Manager V7.5.1 or above

Configuring WebSphere Adapter for Flat Files to improve performance

This article describes some possible scenarios that describe the features of WebSphere Adapter for Flat Files. These scenarios also describe the benefits of using those features and how to configure them to obtain maximum performance.

The following features are discussed:


Configuring the Poll Period and Poll Quantity parameters

The two most important configuration parameters for WebSphere Adapters are Poll Period and Poll Quantity. You can modify the settings for these parameters in the Activation Specification Properties of the adapter application.

Note: The settings for these parameters affect only the performance of the Inbound polling.

  • Poll Period (pollPeriod): This parameter specifies the amount of time (in milliseconds) between polling actions.
  • Poll Quantity (pollQuantity): This parameter specifies the maximum number of events that need to be processed during a polling action.

Performance benefits

The Poll Period and Poll Quantity parameters control the rate and amount of work that an adapter processes. The combination of these parameters regulate the number of transactions that are processed first by the adapter, and then by the broker (for example, Business Process Management). These parameters influence the performance of the entire solution, not just the adapter.

When you set non-optimal values for these parameters, it can result in the following:

  • Low system throughput (if the Poll Period is too long or the Poll Quantity is too low)
  • Excessive memory usage and potential OutOfMemory exceptions (if the Poll Period is too low or the Poll Quantity is too high)

Since both these conditions dramatically impact the overall system performance, appropriate settings for the Poll Period and the Poll Quantity are critical. You need to explicitly configure these parameters to support the level of throughput that a solution is designed to handle.

Scenarios

If the size of the objects is small (smaller heap memory requirements for objects in-flight) and throughput is essential, then the adapter does not create a bottleneck by over-polling.

If the size of the business objects is large (bigger heap memory requirements for objects in-flight), then the criteria is to limit the number of objects at any moment to avoid out-of-memory conditions.

We recommend that you configure the Poll Period and Poll Quantity parameters to enable events to be retrieved and processed at a level that matches the peak throughput of the overall solution. The following example discusses this in detail.

If the peak throughput rate of a solution is 20 events per second, and the events are continuously available to the adapter for processing, you must set the Poll Period parameter to a smaller value, for example, 10 milliseconds; and set the Poll Quantity parameter to 20 milliseconds. This supports the required peak throughput while continuing to support events with a smaller value. The Poll Period parameter ensures a minimal delay between poll cycles.

The following factors may need to be adjusted for the Poll Quantity and Poll Period parameters - the size of the object being processed. For larger objects, a good rule of thumb is to use a lower Poll Quantity and a longer Poll Period. This does not generally apply for relatively small objects (100 KB or less). However, for larger objects, it is important to minimize the number of objects held concurrently in the adapter process (in-flight) to avoid potential OutOfMemory exceptions.

In the example above, if the object size is 1 MB and the throughput rate of the solution is 2 events per second, the appropriate settings are pollPeriod = 100 milliseconds and pollQuantity = 2.

Note: You can further adjust the poll period by calculating the time it takes to process the pollQuantity events and adjusting the poll period accordingly.


Configuring WebSphere Adapter for Flat Files in a cluster environment

Clusters are groups of servers that are managed together to balance workloads and to provide high availability and scalability. You can configure WebSphere Adapter for Flat Files in a cluster environment either in "HA Active-Active" mode or "HA Active-Passive" mode.

Performance benefits

HA Active-Active is useful when you need to process a large number of business objects by WebSphere Adapter for Flat Files. Because of the performance impact, HA Active-Active allows you to process heavy files (with the splitbySize and splitbyDelimiter functionality). In HA Active-Active mode, multiple instances of the adapter can run parallel to process the business object (BO) present in the event files and to deliver to the endpoint. Based on the poll quantity and poll frequency, the first instance, <is this polling instance?>, picks up the first set of BOs and starts processing it. However, the other consecutive instances simultaneously pick the next set of BOs to process. This increases the efficiency as well as the processing performance of the adapter.

Note: You can only configure HA Active-Active for an inbound operation.

If you configure the adapter to the HA Active-Active mode, it impacts the performance when there are large number of BOs that need to be processed. In the HA Active-Passive mode, only a single instance of the adapter processes the BOs present in the event files at a time. Other instances are processed only when the first instance is completely processed. You can configure HA Active-Passive for both outbound and inbound operations.

Configuration

In order to configure the adapter for HA Active-Active, there are some prerequisites that are mandatory.

You need to:

  • Set the type of delivery attribute to be "Unordered". HA Active-Active does not support an Ordered type of delivery.
  • Configure all the Event persistence properties.
  • Specify the type of sorting as "No Sort". HA Active-Active does not support file sorting while retrieving files.
  • Specify the time interval for the adapter to process the fetched events.

For instructions to configure HA Active-Active, see WebSphere Adapter for Flat Files 7.5.0.0 QSS Quick Start Scenarios, Tutorial #9.

To deploy and configure WebSphere Adapter for Flat Files in a clustered environment, refer to Deploying and configuring WebSphere Adapters in a clustered environment.


Using the unordered delivery of events

Delivery of events specifies the order in which events are delivered by the adapter to the endpoint. The type of delivery is either ordered or unordered.

Performance benefits

In an unordered type of delivery, multiple threads run simultaneously to deliver the events to the endpoint, which boosts the performance of WebSphere Adapter for Flat Files. However, in an ordered type of delivery, only a single thread runs to deliver the events.

Note: When a sequencing of file processing is required, do not use the unordered type of delivery.

Configuration

To set the type of delivery to unordered, select the UNORDERED from the drop-down menu, as shown in the Figure 1. Here, the adapter delivers all the events to the endpoint at once.

Figure 1. Unordered type of delivery
Unordered type of delivery

Note: The default type of delivery selected is "Ordered".


Using the SplitBySize and SplitByDelimiter features

WebSphere Adapter for Flat Files uses the file splitting feature to divide a large file into smaller chunks, which are then retrieved separately. The file content is split according to the SplittingFunctionClassName and SplitCriteria properties defined in the interaction specification.

The splitting feature is available for inbound and some specific features for outbound (Create, Retrieve, Append, and Overwrite).

Performance benefits

Splitting the file into smaller chunks can have a significant performance improvement, especially when you use heavy event files. WebSphere Adapter for Flat Files splits these events files into small chunks, process and deliver to the endpoint, thereby increasing efficiency.

Consider using the splitting feature when the adapter has to process heavy or a large number of files. During inbound polling, the splitting by size or delimiter feature splits the event file based on the size or delimiter specified during polling, and sends the smaller chunk as individual events to the endpoint.

During the outbound operation, the splitting by size or delimiter feature enables the adapters to split the event file based on the size or delimiter specified. The operation for create, append, or retrieve overwrites the file into smaller chunks.

Depending upon the type of content in the file, you can split the file by delimiter or by size:

  • When the content of the business object has a definite structure, for example, if it contains elements such as name, address, and city, the file is split by delimiter optionally.
  • When the content of the business object contains unstructured data, such as plain text or binary files, the file is split by size.

Note: By default, the adapter splits the files by size.

Configuration

You can select the structure of the business object to use either SplitByDelimiter or SplitBySize.

Specify the SplittingFunctionClassName and SplitCriteria properties during the Enterprise Metadata Discovery (EMD), as shown in Figure 2 (for SplitBySize) and Figure 3 (for SplitByDelimiter).

Figure 2. Configuring SplitBySize
Configuring SplitBySize

Note: By default, the value set for "Split criteria to split file content" for SplitBySize is 0.

Figure 3. Configuring SplitByDelimiter
Configuring SplitByDelimiter

Using the cyclic mode to poll for modified content

In the cyclic mode, the adapter polls the event directory for files on a continuous basis for an inbound operation and checks for possible updates for each event file. If the file in the event directory is found with updates (that is, if the last modified time of the file is different from the last modified time), then the adapter notifies these updates to the endpoint application in the following methods:

Sending only the appended content

When there is an update available for a file, then the adapter sends the newly appended content as a BO to the endpoint application.

Note: You can select this option if you are sure that only new content is added as part of the update to the existing event file.

Sending the complete file content

When there is an update available for a file, then the adapter sends the complete content of the file as a BO to the endpoint application.

Note: As part of the update to the file, the file content can be completely modified, or some parts of the existing content might have been deleted or modified. Since the adapter cannot track such changes, we recommend that you use the "Send the Complete File Content" option for the cyclic mode.

Performance benefits

Earlier, when the customers made any changes in the event files, they had to rename the file, put it in the event directory again, then only the adapter starts processing them. However, by using the cyclic mode property, you do not need to perform the same process again. The adapter itself takes care of all files (under the Event directory) that are modified. It either sends only the appended contents, or sends the whole file content at the endpoint.

Configuration

To configure the cyclic mode, select the check box Poll event files for modified content property from the EMD, as shown in Figure 4.

Figure 4. Configuring the cyclic mode
Configuring the cyclic mode

Note: To send only the appended content, select the check box that specifies Include only the newly appended content. If you do not specify this option, the adapter sends the whole content present inside the file, including the appended content.

When the Cyclic Mode feature is enabled, the following features are disabled:

  • The "File Pass by Reference" property is disabled, as both the Cyclic Mode and File Pass by Reference feature cannot co-exist.
  • The "Time interval for polling unchanged files" property is disabled.

Note: If you enable this feature and specify a valid archive directory path in the "Archive Directory" property, it is not valid. In the cyclic mode, the adapter does not archive or delete any files from the event directory. The adapter constantly searches for updates in the existing files.


Conclusion

This article described how to use the various properties and features of WebSphere Adapter for Flat Files to improve performance, which decreases over a period of time as it processes large number of files.

Acknowledgments

The authors would like to thank Vrinda Vasudevan for reviewing this article.

Resources

Learn

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into WebSphere on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=WebSphere
ArticleID=840603
ArticleTitle=Tuning WebSphere Adapter for Flat Files for better performance
publish-date=10172012