Skip to main content

Application Integration::Population=Multi Step Federated Gather variation::Runtime patterns

Population=Multi Step Federated Gather variation::Runtime patterns

Population=Multi Step Federated Gather variation::Runtime pattern Data Server Services Data Server Services Population Population Data Server Services Data Integration
Design Last Updated: 10-20-2004
(Click a node to get a detailed explanation.)

The flow in the figure above is that the Process tier makes a query for data from the "federated" data source such as a simple SQL SELECT request. The Data Integration node processes the request by using the metadata (which defines the data sources) to pass on the requests to the appropriate data sources. Usually, the Process node and the Data Integration node are collocated and tightly integrated.

In many cases, the data integration/federation logic within the Data Integration node is logically separate from the data connector logic. This data connector logic spreads out the overhead of making the query to multiple data sources, thereby allowing the queries to run in parallel against each database. When performance is of major concern, multiple logical data connectors may exist to process queries against a single data source - the idea here being to eliminate any single node in the process from becoming a bottleneck, if too many requests run against one data source.

In all cases, the results that are returned from each individual data source must then be aggregated and normalized by the Data Integration node so that these results appear to be from one "virtual" data source. The results are then sent back to the Population (Process) node, which has no idea that multiple data sources were involved.

Data Server/Services

A Data Server/Services node is a generic data storage node that provides managed, persistent storage of any type of data and a means to directly access and manipulate that data. The data may be stored in files and accessed through file I/O routines or may be stored in a database with more structured and managed access methods.

Data Integration

A data integration node is a specialized application or data server that is optimised for real-time access (read-only or sometimes read/write) to remote data sources, understanding how to access diverse data structures and stores and how to manipulate and process the resulting data. A data integration node typically operates in an "on-line" mode, allowing real-time access to remote data sources either for users, as part of a business process, or for applications, as part of a business or population process.

Population

A population node is a specialized application or data server that is optimized for record-oriented processing where the records must be gathered from one or more source data sets, processed singly and multiply and finally applied to one or more target data sets. A population node typically operates in an "off-line" mode to prepare data in advance of its business usage and based on rules that have been previously defined through a separate user interface module and stored in a metadata repository.

A population server may be further specialized for one or two of its inherent sub-functions--gather, process or apply (also known as extract, transform and load respectively). Such specialization may be for reasons of performance or physical placement.