The preferred data source pattern, provides the ability for a client to retrieve information from a set of information sources, without the need to understand that there are multiple underlying sources. One of the sources is identified as the preferred source, and the others are considered alternate sources, used only when the preferred source cannot provide the desired information. This pattern specification will be published in full on dW (May 2007) but here is a preview of it until then.
Consider the following situations where multiple sources of data must be made to appear as one.
- a company has multiple sources of information, some of which are 'more expensive' to access, e.g., a local and remote parts database
- a company upgrades its IT systems, and in doing so, introduces new sources of information that must be used in conjunction with old sources, e.g., customers
- one or more similar businesses merge, all have somewhat dissimilar data representing the same entities, e.g., customers
- any individual entity is assumed to have some "enterprise unique" identifier that is part of the record, for example, a customer number or SKU
There is also assumption that the integration of the above scenarios in a done in the context information management SOA web services environment.
How can a client retrieve information from a set of disparate information sources, without the need to understand (at least at a high level) that there are multiple sources?
The preferred data source pattern provides the ability for a client to retrieve information from a set of information sources, without the need to understand (at least at a high level) that there are multiple sources. One of the sources is identified as the preferred source, and the others are considered alternate sources, used only when the preferred source cannot provide the desired information. Figure 0 shows the the relationship between the facade and the adapters
The information obtained from any source is assumed to be in the form of "records" that describe entities such as customers or parts. Further, any individual entity is assumed to have some "enterprise unique" identifier that is part of the record, for example, a customer number or SKU.
The heart of the solution is a facade; the client interacts only with the facade, which hides the fact that there are multiple data sources. The facade interface matches that of the preferred source (more on that below). The preferred interface contains one or more operations that allow the client to find (read) information matching various criteria. A find operation returns 0..n records that match the criteria. It is important to understand that no matter which source provides the information, it is possible that none of the returned records are the desired record. Consider a scenario where a store clerk searches in a nation-wide company database for customers with the name "John Smith;" the find operation could return 20 John Smiths, but none are the John Smith in front of the clerk. The client must depend on additional interactions with the users to determine whether any of the returned records are the desired record.
The preferred source pattern assumes that an information source has a one or more 'find' operations that return zero or more instances of the entity record, or perhaps a subset of the entity record. The information source may have one or more 'write' operations that allow a client to create and update entity records.
Figure 1 shows a sequence diagram for a find operation in the pattern. The client invokes the facade. The facade invokes the preferred information source. If there are no matches from that source, the facade invokes the alternate information sources in a pre-defined order until matches are found, or until all the alternative sources are exhausted. Once a match is found, or all sources are exhausted, the facade returns to the client. Note that for clarity, I've not shown the synchronous returns.
In its simplest form, the preferred source, and thus the pattern, supports only find (read) operations. A virtual catalog capability might leverage such a 'read-only' pattern, as there is no need (or perhaps no ability) to update the preferred source. The description for the simplest form must include:
The WSDL document describing the preferred source and all alternate sources. The interface (port type) of the preferred source is used by the facade and all alternate sources. If an alternate source does not natively expose the same interface, a transform pattern is applied to the source, but the transform is out of scope. The WSDL for the alternate sources must differ from the preferred source at least in the endpoint address; it may differ in the binding(s) as well, with a bit more work.
The schema describing the entity record and any other parameters used in the interface. Note that the schema will be defined by or imported by the WSDL document. As indicated above, it is assumed that an entity record includes a unique ID.
Identification of the 'find' operations to which the pattern will be applied. All other operations will be treated as pass-through.
A list showing the order in which the alternate sources are invoked. It is of course possible to have a single list of WSDL documents for the services and the first in the list is assumed to be the preferred source.