Skip to main content

Application Integration: Select Application pattern

Overview

The various designs in the Application patterns that follow allow for solution flexibility in Application Integration, and are categorized as either Process Integration or Data Integration. These two categories enable different types of integration functionality.

Process Integration application patterns

Process Integration application patterns are observed where multiple automated business processes are combined to yield a new business offering or to provide a consolidated view of some business entity with many representations in the corporate business systems. An often quoted example is the consolidated view of the state of all relationships of the business with a particular customer.
This mode of integration is highly flexible. In its more sophisticated form it enables "late binding" of the targets of integration and is particularly useful in tying together different platforms and technologies. However it represents a more difficult design and development task compared to data integration and often requires complex middleware.

Explanation for re-engineering of Process Integration application patterns.


Relationship of Process Integration patterns to Extended Enterprise and SOA profile patterns

The PI patterns introduce the following patterns - Direct Connection, Broker, Router, Serial Process, Serial Workflow, Parallel Process, Parallel Workflow, Hub, Zone etc.

The EE profile adds the Exposed qualifier and Partner applications and infrastructure.

The SOA profile builds off both of these and adds ESB, ESB Gateway, and BSC patterns.


Common Services for the Process Integration application patterns

Process Integration application patterns contain a well-defined set of services, combinations of which are used in the patterns observed in practice. These services include:

  1. Protocol adapters
  2. Message handlers
  3. Data transformation
  4. Decomposition/Recomposition
  5. Routing/Navigation
  6. State management
  7. Security
  8. Local business logic
  9. (Business) unit-of-work management

More descriptive information on these services can be found in the "Application Integration Services" section of the general guidelines page. Then, select from the following Process Integration application patterns the design that best addresses the specific requirements of your solution.

Documentation of the most frequently observed QoS concerns that you must consider when implementing integration solutions.


Legend for Process Integration application patterns

Process Integration application patterns legend

SOA

The original set of PI patterns is intended to satisfy a wide generic set of integration requirements, not just SOA. The SOA profile specialises these more general patterns for the SOA environment.

Process Integration and Support of Serial and Parallel Interaction
No Parallel Interaction Parallel Interaction
Serial Interaction Serial Process Variation: Serial Workflow Parallel Process Variation: Serial Workflow
No Serial Interaction Direct Connection Variations: Message/Call Connection Broker Variation: Router

The Business drivers for the Process Integration patterns
Business Drivers Direct Connection=Message Connection Direct Connection=Call Connection Router Variation Broker Serial Process Serial Workflow variation Parallel Process Parallel Workflow variation
Improve organizational efficiency yes yes yes yes yes yes yes yes
Reduce the latency of business events yes yes yes yes yes yes yes yes
Support a structured exchange within the organization yes yes yes yes yes yes yes yes
Support real-time one-way "message" flows yes no yes yes yes yes yes yes
Support real-time request/reply "message" flows no yes yes yes yes yes yes yes
Support dynamic routing of "messages" to one of many target applications no no yes yes yes yes yes yes
Support dynamic distribution of "messages" to multiple target applications no no no yes yes yes yes yes
Support automated coordination of business process flow no no no no yes yes yes yes
Reduce cycle time through parallel execution of portions of a process flow no no no no no no yes yes
Support human interaction and intervention within the process flow no no no no no yes no yes

The IT drivers for the Process Integration patterns
IT Drivers Direct Connection=Message Connection Direct Connection=Call Connection Router Variation Broker Serial Process Serial Workflow variation Parallel Process Parallel Workflow variation
Minimize total cost of ownership (TCO) no no yes yes yes yes yes yes
Leverage existing skills yes yes yes yes yes yes yes yes
Leverage the legacy investment yes yes yes yes yes yes yes yes
Enable back-end application integration yes yes yes yes yes yes yes yes
Minimize application complexity yes yes yes yes yes yes yes yes
Minimize enterprise complexity no no yes yes yes yes yes yes
Improve maintainability no no yes yes yes yes yes yes
Improve flexibility by externalizing process logic from application logic no no no no yes yes yes yes
Support long running transaction no no no no no yes no yes

Direct Connection

The Stand-Alone Single Channel application pattern provides a structure for applications that have no current need for integration with other systems and need only focus on one delivery channel. While this Application pattern can be used to implement any one of the delivery channels, the following discussion focuses primarily on the Web delivery channel.

The Direct Connection application pattern represents the simplest interaction type and is based on a 1-to-1 topology. It allows a pair of applications within the organization to directly communicate with each other. Interactions between a source and a target application can be arbitrarily complex. Generally, complexity can be addressed by breaking down interactions into more elementary interactions.

More complex point to point connections will have modeled connection rules such as business rules associated with them, as shown above. Connection rules are generally used to control the mode of operation of a connector depending on external factors. Examples of connection rules are:

The Direct Connection application pattern has two variations:

All applications of the Direct Connection application pattern will be one variation or the other. The variation required depends on whether the initiating source application needs an immediate response from the target application in order to continue with execution.

Both variations may be used either with synchronous or asynchronous communication protocols. However, there are preferences for a specific protocol type depending on the variation. For example, the Call Connection variation has a more natural fit with synchronous protocols while the Message Connection variation favors asynchronous protocols.


Business and IT Drivers

The business and IT drivers for choosing the Direct Connection application pattern are to:

The primary goal is to allow one application to gain direct and real-time access to another in order to reduce the latency of business events.


Solution

Select this Application pattern
Direct Connection application pattern

For a legend, please see above

This Application pattern, as shown in the figure above, is divided into a number of logical components:


Guidelines for use

Direct integration between applications can be inflexible, in that any changes to one application may have knock-on effects on other applications. Changes to the target application may also require changes to the source application. Such changes can become both expensive and time consuming, especially when the target application is being accessed by a number of different source applications.

Different IT departments may also be responsible for developing and maintaining the source and target applications. Under such a scenario, development might be difficult to coordinate, especially if the interfaces between the applications being integrated are not properly defined and documented. Because of this, it is important to clearly define such interfaces in advance.


Benefits

The Direct Connection application pattern offers the following benefits:

Limitations

Although this is a reasonable starting Application pattern for integrating applications in a one to one relationship with one another, this pattern will result in a many to many "spaghetti" configuration with point to point integration mappings for each application pair. Also, the expansion of this implementation into a multi-point configuration will require additional application logic to handle the coordination.

This pattern cannot be used for intelligent routing of requests, decomposition and re-composition of requests, and for invoking complex business process workflow as a result of a request from another application. Under such circumstances, you should consider a more advanced Application pattern, such as Broker or Serial/Parallel Process.


Putting the Application pattern to Use

ITSO Electronics, an electronics retailer/wholesaler, wants to integrate their retail and wholesale departments. Currently, both organizations have proven IT infrastructures but have no interconnectivity. The first process ITSO Electronics wants to focus on is the inventory and order replenishment process. Currently, the items sold are tallied at the end of the month by the retail ordering process and delivered to the wholesale organization by internal mail. This creates a lag in the inventory replenishment process and causes many out of stock situations. A primary business goal is to minimize the loss of sales due to out of stock situations. To meet these requirements ITSO Electronics chooses the Direct Connection application pattern.

Message Connection variation

Message Connection variation
For a legend, please see above.

The Message Connection variation, shown in the figure above, applies to solutions where the business process does not require a response from the target application within the scope of the interaction.


Business and IT Drivers

The business and IT driver for choosing the Message Connection variation of the Direct Connection application pattern is to:

The main driver for selecting this variation is when the business process has no interest in the result of the operation. This variation also has the most natural fit when message-oriented middleware is used, such as IBM WebSphere MQ.


Putting the Application pattern to Use

In our scenario the retail department of the ITSO Electronics organization needs to notify the wholesale department to update their inventory records when a part needs to be ordered. The retail department does not require any acknowledgement of this request. To meet these requirements ITSO Electronics chooses the Message Connection variation of the Direct Connection application pattern.

Call Connection variation

Call Connection variation
For a legend, please see above.

The Call Connection variation, shown in the figure above, applies to solutions where the business process depends on the target application to process a request and return a response within the scope of the interaction.


Business and IT Drivers

The business and IT driver for choosing the Call Connection variation of the Direct Connection application pattern is to:

The main driver for selecting this variation is when the business process does require a result message from the interaction.


Putting the Application pattern to Use

In our scenario the retail department of the ITSO Electronics organization needs to be advised by the wholesale department of the expected delivery date of a part on order that is out of stock with the retail department. To meet these requirements ITSO Electronics chooses the Call Connection variation of the Direct Connection application pattern.

Broker

The Broker application pattern, shown in the figure above, is based on a 1-to-N topology that separates distribution rules from the applications. It allows a single interaction from the source application to be distributed to multiple target applications concurrently. This application pattern reduces the proliferation of point-to-point connections.

The Broker application pattern applies to solutions where the source application starts an interaction that is distributed to multiple target applications that are within the organization. It separates the application logic from the distribution logic based on broker rules. The decomposition/ recomposition of the interaction is managed by the broker rules tier.

The Broker pattern reuses the Direct Connection pattern to provide connectivity between the tiers. The Broker Rules may support Message variation or Call variation (or both variations) of the Direct Connection pattern.

The Broker application pattern was previously known as the Aggregator application pattern for read intent calls and the Broker application pattern for Messages and update intent calls. However, this distinction was found to be of insufficient value to warrant a separate pattern - and so it has been dropped from the revised PI patterns.

The Broker application pattern is also used as the Application pattern for the Pub/Sub Runtime variation which can be found here.


Business and IT Drivers

The primary business driver for selecting this Application pattern is to allow one application to interact with one or more of multiple target applications. Using a hub-and-spoke architecture instead of a point-to-point architecture allows for the seamless integration of applications while minimizing the complexity. A request for information can be routed to one of many targets or simultaneously to multiple targets. The resulting request message can be decomposed into multiple request messages, and the reply messages then recomposed into a single reply message using appropriate recomposition rules.

This externalization of routing, decomposition, and recomposition rules from individual source and target applications increases the maintainability and flexibility and reduces the enterprise wide integration complexity.

This Application pattern is particularly important when a processing request requires execution of multiple interactions concurrently, or where the source application should be relieved of the need to know anything about its targets.

The primary IT driver for selecting this Application pattern is to allow loose coupling of clients and services with minimum modification to each. The solution should allow for multiple transmission protocols to be used and for transformation of protocols between client and service.


Solution

Select this Application pattern
Broker application pattern

For a legend, please see above.

This Application pattern, as shown above, is divided into a number of logical components:


Guidelines for use

To increase the flexibility of the solution and responsiveness to changing business requirements, it is recommended that particular attention is paid to definition of reusable messages/services that pass through the Broker tier.

Robust transaction processing systems should be used to implement the back-end applications to ensure availability, scalability, and performance.

A decomposition implementation (one source call to multiple target calls) requires state persistence and re-composition of the response messages. Standards should be used where possible to minimize future changes required to the source and target applications.


Benefits

The benefits of this Application pattern are:

Limitations

Logic must be implemented at the broker for routing and decomposition/recomposition tasks.


Putting the Application pattern to Use

ITSO Electronics consists of multiple Retail stores and Wholesale departments. The Retail stores get their supplies from the Wholesale departments and have a need to request the delivery dates of those supplies before ordering. Currently there is no integration of the Retail and Wholesale applications. All interaction between the two are done over the phone or by mail. A solution must be found to allow Retail stores to request delivery dates from the Wholesale departments. To eliminate the need for the Retail departments to know which Wholesale department carries which supplies, a Broker is needed to take incoming requests and direct them based on part numbers to the Wholesale department that carries them. In the event that a part is carried by multiple Wholesale departments, the broker must get delivery dates from each and return the best date and the Wholesale department that can supply it to the Retail department.

Broker=Router variation

The Router variation of the Broker application pattern, shown in the figure above, applies to solutions where the source application initiates an interaction that is forwarded to at most one of multiple target applications.

Where the Broker application pattern enables 1:N connectivity, the Router application pattern enables 1:1 connectivity where the Router Rules tier selects the target.

The Router variation of the Broker application pattern was previously known as the Router variation of the Aggregator application pattern. [The Aggregator application pattern facilitates multi-point request for information integration between applications.]


Business and IT Drivers

The primary business driver for selecting this Application pattern is similar to that of the Broker application pattern. The difference lies in the fact that the Router tier routes the request to only one of multiple target applications. The requirement for transformation of message and interface format still applies. Externalizing the routing from individual source and target applications increases the maintainability and flexibility and reduces the enterprise wide integration complexity.

This Application pattern is particularly important when a processing request requires the source application to be relieved of the need to know anything about its targets.

The primary IT driver for selecting this Application pattern is to allow loose coupling of clients and services with minimum modification to each. The solution should allow for multiple transmission protocols to be used and for transformation of protocols between client and service.


Solution

Select this Application pattern
Router variation

For a legend, please see above.

This Application pattern provides a routing function to allow any attached (initiating) application using a single router link to connect to one of multiple target applications. While access to multiple applications is supported, at any given time an application is connected to only one other application. This Application pattern, as shown in the figure above, is divided into a number of logical components:


Guidelines for use

The guidelines for this application pattern are the same as those for the Broker application pattern.


Benefits

The benefits of this Application pattern are:

Limitations

With the Router variation, there is limited ability in the router to manipulate the requests. It performs intelligent routing and protocol transformation, but does not have the ability to send simultaneous requests to the target applications based on one incoming request, nor does decomposition / recomposition ability.


Putting the Application pattern to Use

ITSO Electronics consists of multiple Retail stores and Wholesale departments. The Retail stores get their supplies from the Wholesale departments and have a need to request the delivery dates of those supplies before ordering. Currently there is no integration of the Retail and Wholesale applications. All interaction between the two are done over the phone or by mail. A solution must be found to allow the Retail stores to request delivery dates from the Wholesale departments. To eliminate the need for the Retail departments to know which Wholesale department carries which supplies, a Router is needed to take incoming requests and direct them based on part numbers to the Wholesale department that carries them. This differs from the example outlined in the Broker pattern in that only one Wholesale department will carry a part. There is no need to distribute one request to multiple Wholesale departments simultaneously to see who can supply the part at the earliest date.

Serial Process

The Serial Process Application pattern, shown in the figure above, extends the 1:N topology provided by the Broker Application pattern. It facilitates the sequential execution of business services hosted by several target applications. Therefore, it enables the orchestration of a serial business process in response to an interaction initiated by the source application.


Business and IT Drivers

The primary business driver for selecting this Application pattern is to support the composition of end-to-end business process flows by leveraging business services implemented by several target applications. From an IT perspective, the key driver for selecting this Application pattern is improving the flexibility and responsiveness of IT by externalizing the process flow logic from individual applications.


Solution

Select this Application pattern
Serial Process application pattern

For a legend, please see above.

The Serial Process Application pattern is broken down into three logical tiers:


Guidelines for use

The flexibility and responsiveness provided by this Application pattern heavily depend on the externalization of process execution logic from individual applications. Applications with designs based on a service-oriented architecture (SOA) approach, which have well-defined and coarse-grained business services that represent a unit of work, are better suited for participation in this Application pattern. You must be able to compose these business services into an end-to-end process flow. A given service may need to participate in more than one end-to-end process.

Typically, legacy applications are not designed with this thinking in mind. Similarly, many of the legacy applications have significant amounts of process logic embedded within them. These constraints in existing environments may pose challenges to fully implementing the vision promised by this Application pattern. Careful refactoring of legacy and packaged applications by wrappering them into business services is a good starting point for the eventual widespread implementation of this Application pattern within an enterprise.

Composition of process flows by tying together different applications may introduce the need for compensating transaction support. This is especially the case when certain participating target applications do not leverage XA-compliant transaction processing engines. In such cases, it may be necessary to design compensating transaction pairs for every affected transaction and execute them if there is a need to reverse a particular portion of the process flow. You may need to modify participating legacy and packaged target applications to introduce compensating transactions if they do not already implement such mechanisms.

Finally, pay particular attention to the Business Process Management capabilities supported by the business process design tools and the process execution engines when you select middleware products that facilitate automation of business processes. The eventual goal is to enable business users to compose business processes and make necessary changes with minimal involvement from IT professionals. The business processes that are defined must be easily exported into a process execution engine. More sophisticated business process management tools allow for the definition of metrics during the process design to measure the effectiveness of process implementation and support monitoring of the metrics in the process execution engine.


Benefits

The Serial Process Application pattern improves the flexibility and responsiveness of an organization by implementing end-to-end process flows and by externalizing process logic from individual applications. In addition, it provides a foundation for automated support for Business Process Management that enables the monitoring and measurement of the effectiveness of business processes.

Limitations

This Application pattern is ideally suited for straight-through processing where human interactions are not necessary to complete an end-to-end process. If support for human interactions is needed to complete certain process steps, consider the Workflow variation of this Application pattern.


Putting the Application pattern to Use

ITSO Electronics wants to integrate its retail department with its two inventory wholesale departments, namely Wholesale A and Wholesale B. Currently, these three departments have proven IT infrastructures but no interconnectivity. ITSO Electronics wants to focus on automating the inventory replenishment process.

Typically, the retail department places orders with Wholesale A. However, when the Wholesale A is unable to guarantee delivery within seven days, Wholesale B is contacted to check the anticipated delivery date. Then, the order is placed with departments that guarantee the shortest delivery date.

To meet these business process automation requirements, ITSO Electronics chooses the Serial Process Application pattern. The primary driver for this selection is the need to externalize process logic from individual applications. This promotes flexibility and responsiveness to changing business needs.

Serial Workflow variation

The Serial Workflow variation of the Serial Process Application pattern, shown in the figure above, extends the basic serial process orchestration capability by supporting human interaction for completing certain process steps.


Business and IT Drivers

All the business and IT drivers listed under the Serial Process Application pattern apply to this variation as well. The additional business driver for selecting this variation is the need to support human interaction and intervention within the process flow. Support for long-running transactions is another IT driver, which is often a prerequisite for the automation of complex process flows involving human interaction.


Solution

Select this Application pattern
Workflow variation

For a legend, please see above.

The Serial Workflow variation is broken down into three logical tiers:


Guidelines for use

These guidelines apply to this variation in addition to the guidelines that are documented in "Serial Process Application pattern" above. We recommend that you implement people-based exception handling for the majority of the automated tasks within the process. If an automated task reaches certain error conditions, a person must be able to intervene and handle exceptions.


Benefits

The Serial Workflow Application pattern improves the flexibility and responsiveness of an organization. It does this by implementing end-to-end process flows that externalize process logic from the individual application. Further flexibility is introduced by the externalization of task-resource resolution rules. In addition, it provides a foundation for automated support for Business Process Management that enables monitoring and measurements of the effectiveness of business processes.

Limitations

This variation does not support the parallel execution of multiple tasks. Under such circumstances, consider the more advanced Parallel Process Application pattern and Parallel Workflow variation.


Putting the Application pattern to Use

ITSO Electronics wants to integrate its retail department with its two wholesale departments, namely Wholesale A and Wholesale B. Currently, these three departments have proven IT infrastructures but have no interconnectivity. ITSO Electronics wants to focus on automating the inventory replenishment process. Typically, the retail department places orders with Wholesale A. However, when the Wholesale A is unable to guarantee delivery within seven days, Wholesale B is contacted to check the anticipated delivery date.

The main change from the scenario used in "Serial Process Application pattern", is documented here. If both Wholesale A and Wholesale B cannot offer delivery within seven days, a retail department manager must review the shortest anticipated delivery date proposed by the wholesale department systems and approve the order before placing it. The intent of this review is to determine whether other sourcing options must be considered.

To meet these business process automation requirements, ITSO Electronics chooses the Serial Workflow variation of Serial Process Application pattern. The primary drivers for this selection include the need for externalization of process logic from the individual application. This promotes flexibility and responsiveness to changing business needs and the need to support human interaction.

Parallel Process

The Parallel Process application pattern, shown above, extends the basic serial process orchestration capability provided by the Serial Process application pattern by supporting parallel (concurrent) execution of the sub-processes.


Business and IT Drivers

All the business and IT drivers listed under the Serial Process application pattern apply to this Application pattern as well. The additional business driver for selecting this pattern is the need to reduce cycle time through the parallel execution of certain portions of the process flow.


Solution

Select this Application pattern
Parallel Process application pattern

For a legend, please see above.

The Parallel Process application pattern is broken down into three logical tiers:


Guidelines for use

The following guidelines apply to this variation in addition to the guidelines that are documented under the Serial Process application pattern.

The implementation of parallel processes without sufficient support from the selected runtime engine would require the development of excessive custom code. The need for parallel process execution must be analyzed before middleware selection decisions are finalized.

Judicious use of parallelism is a powerful tool for reducing the cycle time of a process in the right circumstances. However, in practice, it is critical to ensure that all of the error scenarios are carefully analyzed and that the impact of these scenarios upon the end-user experience is thoroughly understood. The number of error scenarios and processing complexity increases exponentially with the degree of parallelism. Hence, the best practice is to start with a serial process and introduce limited parallelism only where there is a clear and worthwhile benefit.


Benefits

In addition to providing all the benefits provided by the Serial Process application pattern, this pattern provides a foundation for the reduction of cycle times by implementing parallel processes.

Limitations

Parallel processes are more complex to design, test, and operate than serial processes.

In addition, this Application pattern is ideally suited for straight-through processing where human interactions are not necessary to complete an end-to-end process. If support for human interactions are needed to complete certain process steps, consider the Workflow variation of this Application pattern.


Putting the Application pattern to Use

ITSO Electronics, an electronics retailer/wholesaler, wants to integrate its retail department with its two wholesale departments, namely Wholesale A and Wholesale B. Currently, these three departments have proven IT infrastructures but have no interconnectivity. ITSO Electronics wants to focus on automating the inventory replenishment process.

The main difference from the scenario used in the Serial Process and Serial Workflow application patterns sections is that here both wholesalers are queried in parallel to find who offers the shortest delivery time. In other words, Wholesale Dept. A is not considered as the defacto supplier of parts in this scenario. The order is then automatically placed with the wholesale department that offers the shortest delivery date.

To meet these business process automation requirements, ITSO Electronics chooses the Parallel Process application pattern. The primary drivers for this selection include the need for externalization of process logic from the individual application, thus promoting flexibility and responsiveness to changing business needs and addressing the need for reducing cycle time of queries by simultaneously sending enquiries to the two departments for the best delivery date.

Parallel Workflow variation

The Parallel Workflow variation of the Parallel Process application pattern, shown above, extends the basic parallel process orchestration capability by supporting human interaction for completing certain process steps. This is the most sophisticated Process Integration Application pattern in the domain of Application Integration patterns.


Business and IT Drivers

All of the business and IT drivers listed under the Parallel Process application pattern apply to this variation as well. The additional business driver for selecting this variation is the need to support human interaction and intervention within the process flow. Support for long running transactions is another IT driver, which is often a prerequisite for the automation of complex process flows that involve human interaction.


Solution

Select this Application pattern
Workflow variation

For a legend, please see above.

The Parallel Workflow variation is broken down into three logical tiers:


Guidelines for use

The following guidelines apply to this variation in addition to the guidelines that are documented under the Parallel Process application pattern.

It is recommended that people-based exception handling be implemented for all automated tasks within the process. In other words, if an automated task reaches certain error conditions, human actors must be able to intervene and handle the exceptions.


Benefits

The Parallel Workflow application pattern improves the flexibility and responsiveness of an organization by implementing end-to-end process flows that externalize process logic from individual applications. Further flexibility is introduced by the externalization of task-resource resolution rules.

It supports the reduction of cycle time by supporting parallel execution of portions of a process flow.

In addition, it provides a foundation for automated support for Business Process Management that enables monitoring and measurement of the effectiveness of business processes.

Limitations

Only a few middleware products are capable of supporting all the capabilities needed to realize this Application pattern. If this Application pattern is implemented using middleware products that do not support the necessary capabilities, the implementation could be very complex.


Putting the Application pattern to Use

ITSO Electronics, an electronics retailer/wholesaler, wants to integrate its retail department with its two wholesale departments, namely Wholesale A and Wholesale B. Currently, these three departments have proven IT infrastructures but have no interconnectivity. ITSO Electronics wants to focus on automating the inventory replenishment process.

The main difference from the scenario used in Parallel Process application patterns sections is documented here. In this scenario, both wholesalers are queried in parallel to find who offers the shortest delivery time. The order is then automatically placed with the wholesale department that offers the shortest delivery date, unless the shortest delivery time received from the wholesale departments exceeds 10 business days. In that case, a human intervention is required by the Retail Department Manager to review the anticipated delivery date to determine other sourcing options that must be considered.

To meet these business process automation requirements, ITSO Electronics chooses the Parallel Workflow variation of the Parallel Process application pattern. The primary drivers for this selection include the need for the externalization of process logic from the individual application, thus promoting flexibility and responsiveness to changing business requirements, the need for reducing cycle time of queries by simultaneously sending enquiries to the two departments for the best delivery date, and the need for supporting human interaction during the execution of the process flow.

Data Integration application patterns

When applications need to share information rather than coordinate processing, data integration is more appropriate than a process integration approach. Note, however, that when the frequency of data update is extremely high (for example, when integrating an order entry system with a back-end ERP system), process integration is the best solution. When this is not the case, however, integration of (application) data repositories is handled outside of any specific application request.

Explanation for re-engineering of Data Integration application patterns.


Data Integration application patterns and variations


Legend for Data Integration application patterns

Data sources are represented by disks in four different colors:

Read/write and read-only refer only to the interaction between the overall pattern and that data source as also indicated in most cases by annotation on the linkages. In general we may assume that the application associated with a particular data source has read/write access.

A dotted box around an application and source data indicates that the source data may need to be accessed throug the owning application via its AP, or may be accessed Directly via a database API. In general, a dotted box around a number of components indicates that we are not specifying which of those components we are interacting with.

A dashed line, arrow, or component indicates an optional component.

The Business drivers for the Data Integration patterns
Business Drivers Population Two-way Synchronization Federation
Basic Multi Step Multi Step Gather Multi Step Process Multi Step Federated Gather Basic Multi Step Basic Cache
Require specialized derived data (e.g. subset, point in time, correlated data, targeted to user group etc.) yes yes yes yes yes no no no no
Distil meaningful information from vast amount of data yes yes yes yes yes yes yes no no
Require R/O access to derived or aggregated data yes yes yes yes yes no no no no
Extensive reconciliation, transformation and restructuring of structured data no yes yes yes yes no yes no no
Provide easier access to vast amount of unstructured data through indexing and categorization no no yes yes yes no no no no
Need to apply changed data to avoid the latency of a total rebuild no no yes no no no no no no
Heavy data cleansing requirement no no no yes no no no no no
Enable transparent access to remote structured and unstructured data with low latency no no no no yes no no yes yes
Independently updated data sources need to be synchronized no no no no no yes yes no no
Require access to diverse structured data sources and/or diverse locations no no no no yes no no yes yes
Real-time access is needed to rapidly changing data no no no no no no no yes yes
Business restrictions on copying source data (e.g. legal, privacy) no no no no no no no yes yes
Require R/O and optionally R/W access to derived or aggregated data no no no no no yes yes yes yes

The IT drivers for the Data Integration patterns
IT Drivers Population Two-way Synchronization Federation
Basic Multi Step Multi Step Gather Multi Step Process Multi Step Federated Gather Basic Multi Step Basic Cache
Network performance or availability issues yes yes yes yes yes yes yes no yes
Require protection of operational system performance yes yes yes yes yes no no no no
Require reliable, extended availability of the data yes yes yes yes yes no no no no
Optimized for future access performance yes yes yes yes yes no no no no
Capture changes to the source data no no yes no no no no no no
Need advanced structured data constructs (e.g. multidimensional cube, snowflake schema etc.) no no no yes no no no no no
Enable faster, more powerful searches of unstructured data by building indicies no no yes yes yes no no no no
Require access to heterogeneous data types no no yes no yes no no yes yes
Technical limitations on copying source data (e.g. volume, performance, number of copies, TCO etc) no no no no no no no yes yes

Federation

The Federation application pattern is a basic Data Integration application pattern that provides access to many diverse data sources and provides the appearance that these sources are a single logical data store. This appearance is delivered as follows:
1. Exposing a single consistent interface to the user (or application) that invokes the function
2. Translating that interface to whatever interface is needed for the underlying data
3. Compensating for any differences in function between the different sources
4. Allowing data from different sources to be combined into a single result set that is returned to the user


Business and IT Drivers

Federation may be required in any business process where the data needed exists in a number of different locations. Such diversity may be the result of historical, technical or organizational factors. Federation is preferred over other data integration methods, such as Population, when the access required meets one or more of the following criteria:

The Federation application pattern's connector/adapter design allows for improved maintainability, minimized TCO, leveraging of existing technology investments, and reduced deployment and implementation costs.

Federation application pattern

Select this Application pattern
Federation application pattern
For a legend please see above.

When called by an application, Federation uses its metadata store to determine where and in what format the required data is stored. Metadata mapping also enables the decomposition of the unified query into requests to each individual repository. The information model thus appears as one unified virtual repository to users. Using adapters for each target repository, data is accessed and retrieved. Based on its knowledge of functionality, performance, and other factors, Federation determines the optimal plan for performing the incoming query, pushing down function to the remote data stores or compensating for missing function locally, and storing intermediate results in the local temporary store. Federation then returns a single result to the calling application, thus integrating the multiple disjoint formats into a common federated schema.

Federation supports both structured and unstructured data, as well as read-only and read/write access to the underlying data stores. Read-write access is best limited to single remote sources, in part because of fundamental theoretical limitations in support for two-phase commit in a fully distributed environment.

Federation=Cache variation pattern

Federation=Cache variation patternFor a legend please see above.

Local temporary storage can be used to cache data returned from read-only queries to remote data sources. Under defined circumstances, this cache can be used to speed up query response time or to compensate for a data source that is temporarily off line. Such function must be used carefully, however, as the cached data and its underlying source may no longer be in sync (there may be a latency involved).

It is also possible and often necessary to maintain the contents of the cache. This involves the use of the Population application pattern described below, "Data Integration::Population".

Population

The Population application pattern has a very simple model. It gathers data from one or more sources, processes that data in an appropriate way, and applies it to some data target. The primary business driver for population is to gather and reconcile data from multiple data sources in advance of a user's need to use this information.

In some cases, the reconciliation is sufficiently simple that it can be conceived as a single (integrated) function. In many cases, however, the transformation and restructuring is rather complex or the gathering phase has unique characteristics.
This leads to four variations on the basic Application pattern as follows:

These population patterns are often applied towards business intelligence-related business problems. They can also be utilized to provide content feeds into an e-business portal of more unstructured data. This "content" can then be accessed via the portal, or even searched via basic portal search capabilities.


Business and IT Drivers

Any business need that requires a specialized copy of data (derived data) from a pre-existing source may indicate the use of the Population application pattern or one of its variation patterns. These needs are most often seen in business intelligence and content search and related applications. However, some cases are also seen in a pure operational environment, where a dedicated copy of data is needed. A key indicator is that the use of the derived data is read-only or a close approximation to it. If there are significant amounts of read/write usage of the derived data, the Two-way Synchronization pattern is indicated.

Such specialized, derived data copies may be:

The business objective can often be summarized as providing the user with quick access to useful information instead of bombarding the user with too much, irrelevant, incorrect, or otherwise useless misinformation.

In many cases, it is the IT drivers rather than the business drivers that dictate the use of the Population set of patterns, because in many cases one can envisage that the business need can be equally well satisfied either by direct access to the original sources or to a copy of those sources. These IT drivers include, among others:

Population application pattern

Select this Application pattern
Population application pattern
For a legend please see above.

The figure above represents the basic population functionality as a "read dataset - process - write dataset" model.

Population=Multi Step variation application pattern

Select this Application pattern
Population=Multi Step variation application pattern
For a legend please see above.

Note: We have deliberately avoided using the traditional extract, transform, and load terminology in order to accommodate the emerging functionality requirements and variations of population patterns.

In the Multi Step variation of the Population application pattern, the basic population function of the Population application pattern is decomposed into its three primary constituents or steps:

The intermediate target data created by one step acts as the source data for the subsequent step. In some cases, the temporary stores may be physically instantiated files; in more modern implementations, the data may be "piped" from one step of the population process to the next.

The figure above shows the three logical steps: Gather, Process and Apply. In most best practice implementations, these functional steps contain additional sub tasks.

When Population consists of multiple steps, there clearly must exist an entity that controls and orchestrates the entire set of function. This is not shown explicitly in the diagram simply because this controlling function seldom exists as a separate entity. It may be considered to be a function of the Process step in this case.

The actual implementation of Population=Multi Step can involve a fewer or greater number of steps than the three shown here. In such cases, the steps in the figure above must be adjusted accordingly, and consideration must be given to the placement of any additional tiers. A number of special cases are treated in the variations below. It is also important to note that this application pattern has been generalized to cover any source data store and target data store.

Population=Multi Step Gather variation application pattern

Select this Application pattern
Population=Multi Step Gather variation application pattern
For a legend please see above.

The Multi Step Gather shown here is an extension of the Population=Multi Step variation shown above that recognizes that the Gather function itself may need to occur in multiple steps.

In the figure above an independent Gather step (Gather 1) extracts a specialized subset of the data and stores it in a temporary or persistent store. This data store is read, perhaps in conjunction with the original data store, by the Gather step (Gather 2) of the Population=Multi Step variation that completes the overall population process.

There are a number of circumstances where Multi Step Gather is found for structured and unstructured data, as follows:

Population=Multi Step Process variation application pattern

Select this Application pattern
Population=Multi Step Process variation application pattern
For a legend please see above.

Like the Multi Step Gather variation described above, "Population=Multi Step Gather variation pattern", the Multi Step Process variation is also an extension of the Multi Step Population variation described above, "Population pattern". In this case, the focus is on supporting population instances where the processing of the received data is rather complex and cannot be performed in a single pass as shown in the figure above.

In this Multi Step Process variation, the Process step is replaced by a more powerful Multi Step Process approach. Within this, the individual Process stages are more likely to be linked directly, as shown by the line connecting them, rather than through intermediate temporary stores, although this possibility is also depicted. Clearly, there may also be more than two stages.

As mentioned earlier, this Multi Step Process approach may be required when building summaries or categorizations of unstructured data, often in conjunction with the Multi Step Gather variation. The Multi Step Process variation may also be required with structured data, for example, when populating a multidimensional cube or snowflake schema from an enterprise data warehouse.

Another use of this variation is in data cleansing implementations. Data cleansing often requires multiple passes of the data to gather statistics, perform analyses, propose changes, obtain human approval, and so on. In many cases, the cleansed data may be partially written back directly to the source, as shown in the figure above.

Population=Multi Step Federated Gather variation application pattern

Select this Application pattern
Population=Multi Step Federated Gather variation application pattern
For a legend please see above.

The figure above shows how the Population application pattern can be composed with the Federation application pattern as a means to gather data from one or more sources, by providing a unified query that accesses data in separated or remote structured and unstructured repositories in real-time.

Use of this variation pattern is indicated by a number of key requirements, such as reduced latency of population, reuse or extension of existing population investments, and reduced implementation or maintenance costs.

The figure above shows how the Gather step in Population=Multi Step variation described above, "Population=Multi Step variation pattern", is replaced by a potentially synchronous "Federated Gather" step that directly accesses remote data stores, structured or unstructured. This access is mediated through wrappers (aka adapters) that contain the logic to access the data, either directly or through an application API, and send the results back to the requestor tier. These requests may simultaneously access multiple data stores.

Metadata mapping enables the decomposition of a unified query into requests to each individual data store. The multiple data sources thus appear as one unified virtual data store to the requestor. In some cases, there may be separate metadata stores for the Population and Federation components, although this may lead to data consistency issues.

The Multi Step Federated Gather variation application pattern contains its own temporary/persistent store. This store can be used to cache results data obtained from remote sources, allowing continued access to remote data when the actual source is unavailable. Clearly, use of such a cache may have implications for data currency in the target.

Two-way Synchronization

This Two-way Synchronization application pattern was previously known as the Replication pattern. It enables a coordinated bidirectional update flow of data in a multi-copy database environment. It is important to highlight the "two-way" synchronization aspect of this Application pattern, as it is what distinguishes it from the "one-way" capabilities provided by the Population application patterns discussed above, "Data Integration::Population".

We focus on the two-way case here because it is of more interest in business intelligence and similar applications where the relationship between replicas is usually limited to pairs of replicas operating as true master/slaves or where the distributed read/write function is limited to a small percentage of the shared data.


Business and IT Drivers

As in the case of Population, any business need that requires a specialized copy of data-derived data-from a pre-existing source may indicate the need for the Two-way Synchronization application pattern. These needs are most often seen in business intelligence and content search and related applications. However, some cases are also seen in a pure operational environment, where a dedicated copy of data is needed. The key indicator for Synchronization is that the use of the derived data has some strong read-write characteristics.

The business and IT drivers for Two-way Synchronization are partially the same as those listed for Population, "Business and IT drivers" above. However, modern and more sophisticated business intelligence and combined operational/informational needs such as customer relationship management (CRM), call centers, customer portals, etc. place added requirements for updating the derived data. These modern business processes often require that the source and derived data are more closely synchronized than "pure" business intelligence applications, and thus need Two-way Synchronization.

As the need for synchronization increases, the differences between the source and derived data that can be handled decreases, because some transformations are fundamentally unidirectional, or are time-dependent. In the limit, the IT drivers for creating and managing a copy of the source have to be traded off against those for having a single copy of data and accessing that distributed data through the Federation application pattern.

Two-way Synchronization application pattern

Select this Application pattern
Two-way Synchronization application pattern
For a legend please see above.

The figure above shows a basic two-way synchronization of data between two separate data stores. At a simplistic level, it can be compared to the basic Population application pattern described above, "Population pattern", with the only difference being that data now flows in both directions. Depending on the relationship between the data flowing in either direction, this similarity with Population may be more apparent than real. If the data elements flowing in both directions are fully independent, then Two-way Synchronization is no more than two separate instances of Population. However, it is more common to find some overlap between the data sets flowing in either direction. In this case, the need to reconcile data updates on both source/target systems means that the Two-way Synchronization pattern is rather more than two separate Population instances. A significant issue in this case is conflict detection and resolution when updates occur independently in the different data stores.

As indicated by the dotted boxes enclosing the source/target data stores and their controlling applications in the figure above, the Two-way Synchronization pattern may act directly at the data level or at the application level. However, from the viewpoint of Data Integration, the interactions are more likely to be at the data level, while in Process Integration the interactions are more often at the application level.

Applications in this solution design do not necessarily have to be identical.

Two-way Synchronization=Multi Step variation application pattern

Select this Application pattern
Two-way Synchronization=Multi Step variation application pattern
For a legend please see above.

The figure above shows how the Population application pattern can be composed to implement both directions of the synchronization data flow. An additional function "Reconcile" appears between the two data flows, and it is here that the complex process of ensuring that data updates do not conflict, cancel out, or get otherwise corrupted is handled. If the opportunities for conflict are minimal (when there are few overlaps between data flowing in either direction), this pattern can be effectively constructed from existing Population components. However, for more complex situations, a specialized product solution will be more appropriate.

Content navigation