An important goal of SOA design is the identification of services and their specifications. In other words: Which functions and data should I expose as a service and how do I define and model those identified services? The IBM methodology for defining the SOA analysis and design process is the Service Oriented Modeling and Architecture (SOMA) (see Resources).
SOMA (and many other SOA methodologies) relies heavily on business process analysis and use case design to resolve service interface design at the appropriate level of granularity, establish reuse, and so on. Often, the information perspective of SOA is limited to implementing a small number of services as database queries exposed as Web services. This narrow view completely misses the value that established information architecture concepts and patterns can bring to the SOA solution. To fully support scalable, consistent and reusable access to information, the SOA solution needs to include a broader set of design concerns, reflecting information architecture best practices.
Information as a Service applies a set of structured techniques to address the information aspects of SOA design. The goal is that by understanding what business information exists in the solution, informed decisions can be made to ensure that information is leveraged in ways that best support the technical and business objectives of the SOA solution:
- That services are reusable across the entire enterprise.
- That the business data exposed to consumers is accurate, complete and timely.
- That data shared across business domains and technology layers has a commonly understood structure and meaning for all parties.
- That the core data entities linking together the business domains of an enterprise are consistent and trusted across all lines of business.
- That an enterprise gains maximum business value from its data and data systems.
These objectives are valid for all parts of an SOA solution regardless of technology and implementation choices. Exposing an existing application programming interface (API) as a service, for example, requires an understanding of the data being exposed: Is it reliable and accurate? How does it relate to other data in the enterprise? Is it being presented in an understandable format for the consumers? Applying a structured approach to data analysis, modeling and design in an SOA project leads to a solution implementation that is better at meeting existing business requirements as well as being better prepared to adapt to new ones.
Most of the patterns discussed in the information perspective of SOA design apply to any service. They are independent of how the service is realized and are not limited to information services. These patterns are described in a later section.
However, information architecture concepts -- and in particular IBM's Information on Demand approach to information architecture -- can also provide the best implementation choice for some SOA components. For example, the Data Federation pattern is often the best option to implement an SOA component that aggregates data from disparate systems in real time and then exposes it through a common service interface (see Resources). This article includes considerations related to the realization of information services.
General information-related SOA design patterns
Figure 1 shows the three pillars that the information perspective to SOA design is based on. These pillars are to:
- Define the data semantics through a business glossary
- Define the structure of the data through canonical modeling
- Analyze the data quality
Figure 1. Overview
In subsequent articles in this series, learn about the role and value of the pattern for each pillar. Then, get an introduction to the corresponding IBM technology to this pattern.
The Business Glossary
A foundation for any successful SOA is the establishment of a common, easily accessible business glossary that defines the terms related to processes, services, and data. Often, practitioners discover inconsistencies in terminology while trying to learn the accepted business language and abbreviations within an organization. Without an agreement on the definition of key terms such as customer, channel, revenue and so on, it becomes impossible to implement services related to those terms. If stakeholders differ in their interpretation of the meaning of the parameters of a service, or indeed the data set it retrieves, it is unlikely that a service implementation can be successful.
It is critical that business analysts and the technical community have a common understanding of the terminology used across all aspects of the SOA domain, including processes, services and data. The business glossary eliminates ambiguity of language around core business concepts that could otherwise lead to misunderstandings of data requirements.
A business glossary eliminates misinterpretations by establishing a common vocabulary which controls the definition of terms. Each term is defined with a description and other metadata and is positioned in a taxonomy. Stewards are responsible for their assigned terms: they help to define and to support the governance of those terms. Details for the business glossary pattern are discussed in a future article in this series.
A key success factor of a business glossary is to make it easily accessible, to link it to other important modeling artifacts, and also to demand that it is actively used in the design phase of the project. This pattern is supported by InfoSphere ® Business Glossary, which is part of IBM Information Server. This product is described in more detail in a future article in this series.
As well as a tool to manage and share a glossary, IBM also delivers industry-specific intellectual property, in the form of models. These models contain thousands of business terms, clearly defined, to enable data requirements and analysis discussions with stakeholders.
The canonical data model
Consistent terminology is a good starting point when designing services, but this in itself is not sufficient. You must also have a clear understanding of the way business information is structured. The input and output parameters of services, that is, the messages, are often far more complex than single data types. They represent complex definitions of entities and the relationships between them. The development time and quality of SOA projects can be greatly improved if SOA architects leverage a canonical model when designing the exposed data formats of service models. The resulting alignment of process, service/message, and data models accelerates the design, leverages normative guidance for data modeling and avoids unnecessary transformations. Equally important is surfacing the detailed service data model to stakeholders early in the SOA lifecycle. This facilitates identification of the most reusable data sets across multiple business domains, resulting in service definitions that meet the needs of a wide range of service consumers, thus reducing service duplication.
The key problem addressed in this and subsequent articles is how to best ensure a consistent format for information horizontally across the services and vertically between the process, the service, and the data layers in the SOA context. A canonical data model provides a consistent definition of key entities, their attributes and relationships across the various systems that hold relevant data for the SOA project. The canonical data model establishes this common format on the data layer while the canonical message model defines this uniform format on the services layer. The pattern of a canonical data and message model is presented in a future article in this series.
Industry Models provide an integrated set of process, service and data models that can be used to drive analysis and design of service architectures, ensuring a tight alignment of data definitions across modeling domains. They define best practices for modeling a particular industry domain and provide and an extensible framework so that you don't have to constantly redesign your SOA as you add more and more services.
A future article discusses the related data modeling tool Rational Data Architect, and relevant structures from models in greater detail.
Data quality analysis
Practitioners who have considered the concepts described above can deliver service designs with a high degree of consistency across models and metadata artifacts. However, this is no guarantee that the quality of the data that is being returned by services is acceptable. Data which meets the rules and constraints of its original repository and application may not satisfy requirements on an enterprise level. For example, an identifier might be unique within a single system but is it really unique across the enterprise? Quality issues which are insignificant within the original single application may cause significant problems when exposed more broadly through an SOA on an enterprise level. For example, missing values, redundant entries, and inconsistent data formats are sometimes hidden within the original scope of the application and become problematic when exposed to new consumers in an SOA.
The problems therefore are whether the quality of the data to be exposed meets the requirements of the SOA project and how to effectively make that determination. The proposed solution is to conduct a data quality assessment during service analysis and design. After you catalog the source systems that support a service, you can start to investigate them for data quality issues. For example, you should verify that data conforms to the integrity rules that define it. You should verify if data duplication exists and how this can be resolved during data matching and aggregation. On the basis of these types of analysis, you can take appropriate actions to ensure that service implementation choices meet the demanded levels of data accuracy and meaning within the context of the potential service consumers. A future article in this series describes this pattern.
The effectiveness of the data quality assessment can be greatly enhanced with the right tooling decision. InfoSphere Information Analyzer, which is part of IBM Information Server, supports the data quality analysis pattern and is described in a separate article in this series.
The issues and concepts described so far apply to any service in an SOA. Canonical modeling and data quality analysis can provide value to the consistency of services and to its output data regardless of the type of service.
Information services specific patterns
Information services are services whose realization depends on information architecture, or Information on Demand, where a separation of information from applications and processes provides benefits.
Most SOA projects do not start on a green field but are based on an existing IT environment. Some of the challenges are unique to SOA, but, more often than not, well-known problems in traditional information architecture fall within the scope of SOA as well. A typical organization's information environment is often not in an ideal state to enable an effective SOA transformation. From an enterprise perspective, there's often a lack of authoritative data sources offering a complete and accurate view of the organization's core information. Instead, there is a wide variety and technologies used for storing and processing data differently across lines of business, channels or product types. Many large organizations have their core enterprise information spread out and replicated across multiple vertical systems, each maintaining information within its specific context rather than the context of the enterprise. These further drive inconsistencies within the business processes -- which themselves are usually dramatically different within different parts of the enterprise. Information On Demand -- in particular data, content, information integration, master data, and analytic services -- can be leveraged to realize information services that provide accurate, consistent, integrated information in the right context.
Consider the lack of an authoritative, trusted source or single system of record as an illustrative example. Suppose that in an organization's supply chain system's portfolio, there are five systems that hold supplier information internally. Each of these can be considered a legitimate source of supplier data within the owning department. When building a service to share supplier data, what should be the source of supplier data?
- Is it one of the five current systems that have their own copy of the supplier data? If so, which one?
- Is it a new database that's created for this specific purpose? How does this data source relate to the existing sources?
- Does data have to come concurrently from all of the five systems? If so, is it the responsibility of the data architect, the service designer, the business process designer, or the business analyst to understand the rules for combining and transforming the data to a format required by the consumer?
Often an understanding of these disparate data definitions can only be obtained by mapping back to a reference model (often a logical data model), allowing overlaps, gaps and inconsistencies in data definitions to be identified. Reusable, strategic enterprise information should be viewed as sets of business entities, standardized for re-use across the entire organization and made compliant with industry standard structures, semantics and service contracts. The goal is to create a set of information services that becomes the authoritative, unique, and consistent way to access the enterprise information. Allowing access to any information only through an application limits the scope of the information to the context of the application rather than that of the enterprise as required in an SOA. In this target service-oriented environment, an organization's business functionality and data can be leveraged as enterprise assets that are reusable across multiple departments and lines of business. This enables the following principles of information services:
- Single, logical sources from which to get a consistent and complete view of information through service interfaces. This is often referred to as delivering trusted information.
- The underlying heterogeneity that may exist underneath this information service layer and its related complexity is hidden when required (for example, during runtime). However, the lineage of the information -- the mapping of logical business entities to actual data stores -- is available when appropriate (for example, for data stewards to support data governance, impact analysis, etc.).
- The authoritative data sources of the information service are clearly identified and are effectively used throughout the enterprise.
- Valuable metadata about the information service is available:
- The quality of the information exposed through the service is known and meets the expectations of the business. The information services are compliant with data standards that have been defined.
- The currency of the information (how "old" the data is) is known. Effective mechanisms are available to deliver the information with the required latency.
- The structure and the semantics of the information are known and commonly represented on different architecture layers (data persistence layer, application layer, service/message layer, and process layer)
- The information service may be governed based on appropriate
processes, policies and organizational structures:
- The security of the information is guaranteed and incorporated into the solution rather than implemented as an afterthought and follows security and privacy policies.
- The change of the service may be audited.
- The information service is easily discoverable by potential consumers across the organization.
- A holistic governance approach is in place that addresses both the service and the information layer.
Information as a Service is about leveraging information architecture concepts and capabilities -- as defined through Information On Demand -- in the context of SOA. There are important capabilities and concepts in SOA that are not included in Information On Demand and vice versa. But there is also a substantial overlap between them -- such as leveraging content, information integration, and master data services -- which significantly improve the delivery of an SOA project. The following diagram illustrates the alignment between the SOA reference architecture shown on the left (see also Resources) and the Information On Demand reference architecture on the right.
Figure 2. Information services in SOA
As part of the SOA design phase, architects may need to make architecture decisions regarding which patterns to use based on the requirements in the project. Table 1 describes some of the key, but high-level, patterns that may apply.
Table 1.High-level categorization of information service patterns
|Data services||How do I expose structured data as a service?||Implement a query to gather the relevant data in the desired format and then expose it as a service.|
|Content services||How do I best manage (possibly distributed and heterogeneous) unstructured information so that a service consumer can access the content effectively?||Provide a consistent service interface to content no matter where it resides, maintaining the relationship between content and master data.|
|Information integration services||How do I provide a service consumer access to consistent and integrated data that resides in heterogeneous sources?||Understand your legacy data and its quality, cleanse it, transform it, and deliver it as a service.|
|Master data services||How can consumers access consistent, complete, contextual and accurate master data even though the data resides in heterogeneous inconsistent systems?||Establish and maintain an authoritative source of master data as a system of record for enterprise master data.|
|Analytic services||How do I access analytic data out of raw heterogeneous structured and unstructured data?||Consolidate, aggregate and summarize structured and unstructured data and calculate analytic insight such as scores, trends, and predictions.|
IBM Information Server plays an important role in the SOA design phase by providing a unified metadata management platform. This platform consists of a repository and a framework that allows various design tools to access, maintain, and share their artifacts with other IBM Information Server components and third party tools. The value of this shared metadata platform is that metadata artifacts can be easily shared between the tools and is kept consistent.
The purpose of this article is to give you an introduction to the information perspective of SOA design and some of the key patterns -- the business glossary, canonical models, data quality analysis, and information services. You should see the role of leveraging industry models in those design activities. If any of these topics has sparked your interest, be sure to read the coming articles in this series.
- Check out the rest of this series to learn more about topics that were introduced in this article.
- The "Information service patterns" series discusses the information service patterns addressed in this article. (developerWorks. 2006-2007).
- Read "Design an SOA solution using a reference architecture" to get more information on the SOA reference architecture.
Get products and technologies
- Create, manage & share an enterprise vocabulary and classification system with IBM Information Server and in particular InfoSphere Business Glossary.
- Simplify data modeling and integration design with Rational Data Architect.
- Accelerate projects and reduce risk with IBM Industry Models.
- Understand the structure, content and quality of your data sources with InfoSphere Information Analyzer.
- Participate in developerWorks blogs and get involved in the developerWorks community.