 | Level: Intermediate Brian Byrne (byrneb@us.ibm.com), Industry Models and Integration Architect, IBM David McCarty (davidmccarty@fr.ibm.com), IT Architect, IBM Japan Dr. Guenter Sauter (gsauter@us.ibm.com), Information Architect, IBM Peter Worcester (pworcest@us.ibm.com), Services Solution Marketing Manager, IBM Japan John Kling (jkling@us.ibm.com), Consulting and Services Architect, IBM Japan
24 Jan 2008 This article is written for architects and practitioners designing a
Service Oriented Architecture (SOA). It introduces a set of patterns and
capabilities representing the information perspective in the design of an
SOA. The key patterns addressed are the business glossary, the canonical
model and data quality analysis. See how these patterns are positioned in
SOA and discover the contributions they make to an SOA solution. Get an introduction to
the related IBM ® products: IBM Information Server, Rational Data Architect, and
IBM Industry Models. This article is the first in a series: subsequent articles explore each
of the patterns in more detail and then show how IBM products may be used
to implement each pattern.
Introduction
An important goal of SOA design is the identification of services and their
specifications. In other words: Which functions and data should I expose as a
service and how do I define and model those identified services? The IBM methodology
for defining the SOA analysis and design process is the Service Oriented Modeling and
Architecture (SOMA) (see
Resources).
SOMA (and many other SOA
methodologies) relies heavily on business process analysis and use case
design to resolve service interface design at the appropriate level of
granularity, establish reuse, and so on. Often, the information
perspective of SOA is limited to implementing a small number of
services as database queries exposed as Web services. This narrow view
completely misses the value that established information architecture
concepts and patterns can bring to the SOA solution. To fully support
scalable, consistent and reusable access to information, the SOA
solution needs to include a broader set of design concerns, reflecting
information architecture best practices.
Information as a Service applies a set of structured techniques
to address the information aspects of SOA design. The goal is that by
understanding what business information exists in the solution,
informed decisions can be made to ensure that information is leveraged
in ways that best support the technical and business objectives of the
SOA solution:
- That services are reusable across the entire enterprise.
- That the business data exposed to consumers is accurate, complete
and timely.
- That data shared across business domains and technology layers has a
commonly understood structure and meaning for all parties.
- That the core data entities linking together the business domains of
an enterprise are consistent and trusted across all lines of
business.
- That an enterprise gains maximum business value from its data and
data systems.
These objectives are valid for all parts of an SOA solution regardless
of technology and implementation choices. Exposing an existing
application programming interface (API) as a service, for example,
requires an understanding of the data being exposed: Is it reliable and
accurate? How does it relate to other data in the enterprise? Is it
being presented in an understandable format for the consumers? Applying
a structured approach to data analysis, modeling and design in an SOA
project leads to a solution implementation that is better at meeting
existing business requirements as well as being better prepared to
adapt to new ones.
Most of the patterns discussed in the information perspective of SOA
design apply to any service. They are independent of how the service is
realized and are not limited to information services. These patterns are described in
a later section.
However, information architecture concepts -- and in particular IBM's
Information on Demand approach to information architecture -- can also
provide the best implementation choice for some SOA components. For
example, the Data Federation pattern is often the best option to implement an SOA component that aggregates
data from disparate systems in real time and then exposes it through a
common service interface (see Resources). This article includes considerations related
to the realization of information services.
General
information-related SOA design patterns
Figure 1 shows the three pillars that the information perspective to SOA design is
based on. These pillars are to:
- Define the data semantics
through a business glossary
- Define the structure of the data through
canonical modeling
- Analyze the data quality
Figure 1. Overview
In subsequent articles in this series, learn about the role and value of the pattern for each pillar. Then, get an introduction to the
corresponding IBM technology to this pattern.
The Business Glossary
A foundation for any successful SOA is the establishment of a common,
easily accessible business glossary that defines the terms related to
processes, services, and data. Often, practitioners
discover inconsistencies in terminology while trying to learn the accepted
business language and abbreviations within an organization. Without an agreement on
the definition of key terms such as customer, channel, revenue and so on, it becomes impossible to implement
services related to those terms. If stakeholders differ in their interpretation of the meaning of the parameters of a
service, or indeed the data set it retrieves, it is unlikely that a service
implementation can be successful.
It is critical that business analysts and the technical community have a common
understanding of the terminology used across all aspects of
the SOA domain, including processes, services and data. The business glossary eliminates
ambiguity of language around core business concepts that could otherwise
lead to misunderstandings of data requirements.
A business glossary eliminates misinterpretations by establishing a common vocabulary which
controls the definition of terms. Each term is defined with a description
and other metadata and is positioned in a taxonomy. Stewards are responsible for
their assigned terms: they help to define and to support the governance of those terms. Details for the business
glossary pattern are discussed in a future article in this series.
A key success factor of a business glossary is to make it easily
accessible, to link it to other important modeling artifacts, and also
to demand that it is actively used in the design phase of the project.
This pattern is supported by WebSphere ® Business Glossary, which is part
of IBM Information Server. This product is described in more detail in
a future article in this series.
As well as a tool to manage and share a glossary, IBM also delivers
industry-specific intellectual property, in the form of models. These
models contain thousands of business terms, clearly defined, to enable data requirements and analysis discussions with
stakeholders.
The canonical data model
Consistent terminology is a good starting point when designing services,
but this in itself is not sufficient. You must also have a clear
understanding of the way business information is structured. The input
and output parameters of services, that is, the messages, are often far
more complex than single data types. They represent complex definitions
of entities and the relationships between them. The development time
and quality of SOA projects can be greatly improved if SOA architects
leverage a canonical model when designing the exposed data formats of
service models. The resulting alignment of process, service/message,
and data models accelerates the design, leverages normative guidance
for data modeling and avoids unnecessary transformations. Equally
important is surfacing the detailed service data model to stakeholders
early in the SOA lifecycle. This facilitates identification of the most
reusable data sets across multiple business domains, resulting in service
definitions that meet the needs of a wide range of service consumers,
thus reducing service duplication.
The key problem addressed in this and subsequent articles is how to best ensure a consistent
format for information horizontally across the services and vertically
between the process, the service, and the data layers in the SOA context.
A canonical data model provides a consistent definition of key entities,
their attributes and relationships across the various systems that
hold relevant data for the SOA project. The canonical data model
establishes this common format on the data layer while the canonical
message model defines this uniform format on the services layer. The
pattern of a canonical data and message model is presented in a future
article in this series.
Industry Models provide an integrated set of process, service and data
models that can be used to drive analysis and design of service
architectures, ensuring a tight alignment of data definitions across
modeling domains. They define best practices for modeling a particular
industry domain and provide and an extensible framework so that you
don't have to constantly redesign your SOA as you add more and more
services.
A future article discusses the related data modeling tool Rational Data Architect,
and relevant structures from models in greater detail.
Data quality analysis
Practitioners who have considered the concepts described above can
deliver service designs with a high degree of consistency across models
and metadata artifacts. However, this is no guarantee that the quality
of the data that is being returned by services is acceptable. Data
which meets the rules and constraints of its original repository and
application may not satisfy requirements on an enterprise level. For
example, an identifier might be unique within a single system but is it
really unique across the enterprise? Quality issues which are
insignificant within the original single application may cause
significant problems when exposed more broadly through an SOA on an
enterprise level. For example, missing values, redundant entries, and
inconsistent data formats are sometimes hidden within the original scope
of the application and become problematic when exposed to new consumers
in an SOA.
The problems therefore are whether the quality of the data to be exposed meets the
requirements of the SOA project and how to effectively make that determination. The proposed solution is to conduct
a data quality assessment during service analysis and design. After you catalog the
source systems that support a service, you can start
to investigate them for data quality issues. For example, you should
verify that data conforms to the integrity rules that define it. You should verify if data duplication exists and how this can be resolved
during data matching and aggregation. On the basis of these types of
analysis, you can take appropriate actions to ensure that service
implementation choices meet the demanded levels of data accuracy and
meaning within the context of the potential service consumers. A future article in
this series describes this pattern.
The effectiveness of the data quality assessment can be greatly
enhanced with the right tooling decision. WebSphere Information Analyzer,
which is part of IBM Information Server, supports the data quality
analysis pattern and is described in a separate article in this series.
The issues and concepts described so far apply to any service in an SOA.
Canonical modeling and data quality analysis can provide value to the
consistency of services and to its output data regardless of the type of service.
Information
services specific patterns
Information services are services whose realization depends on information
architecture, or Information on Demand, where a separation of information from
applications and processes provides benefits.
Most SOA projects do not start on a green field but are based on an
existing IT environment. Some of the challenges are unique to SOA, but,
more often than not, well-known problems in traditional information
architecture fall within the scope of SOA as well. A typical
organization's information environment is often not in an ideal state
to enable an effective SOA transformation. From an enterprise
perspective, there's often a lack of authoritative data sources
offering a complete and accurate view of the organization's core
information. Instead, there is a wide variety and
technologies used for storing and processing data differently across
lines of business, channels or product types. Many large organizations
have their core enterprise information spread out and replicated across
multiple vertical systems, each maintaining information within its
specific context rather than the context of the enterprise. These
further drive inconsistencies within the business processes -- which
themselves are usually dramatically different within different parts
of the enterprise. Information On Demand -- in particular data, content,
information integration, master data, and analytic services -- can be
leveraged to realize information services that provide accurate,
consistent, integrated information in the right context.
Consider the lack of an authoritative, trusted source or single system
of record as an illustrative example. Suppose that in an organization's
supply chain system's portfolio, there are five systems that hold
supplier information internally. Each of these can be considered a
legitimate source of supplier data within the owning department. When
building a service to share supplier data, what should be the source of
supplier data?
- Is it one of the five current systems that have their own copy of
the supplier data? If so, which one?
- Is it a new database that's created for this specific purpose? How
does this data source relate to the existing sources?
- Does data have to come concurrently from all of the five systems?
If so, is it the responsibility of the data architect, the service
designer, the business process designer, or the business analyst to
understand the rules for combining and transforming the data to a
format required by the consumer?
Often an understanding of these disparate data definitions can only be
obtained by mapping back to a reference model (often a logical data
model), allowing overlaps, gaps and inconsistencies in data definitions
to be identified. Reusable, strategic enterprise information should be
viewed as sets of business entities, standardized for re-use across the
entire organization and made compliant with industry standard
structures, semantics and service contracts. The goal is to create a
set of information services that becomes the authoritative, unique, and
consistent way to access the enterprise information. Allowing access to
any information only through an application limits the scope of the
information to the context of the application rather than that of the
enterprise as required in an SOA. In this target service-oriented
environment, an organization's business functionality and data can be
leveraged as enterprise assets that are reusable across multiple
departments and lines of business. This enables the following principles
of information services:
- Single, logical sources from which to get a consistent and complete
view of information through service interfaces. This is often referred to as delivering trusted information.
- The underlying heterogeneity that may exist underneath this
information service layer and its related complexity is hidden when
required (for example, during runtime). However, the lineage of the
information -- the mapping of logical business entities to actual
data stores -- is available when appropriate (for example, for data stewards
to support data governance, impact analysis, etc.).
- The authoritative data sources of the information service are
clearly identified and are effectively used throughout the enterprise.
- Valuable metadata about the information service is available:
- The quality of the information exposed through the service is
known and meets the expectations of the business. The
information services are compliant with data standards that
have been defined.
- The currency of the information (how "old" the data is)
is known. Effective mechanisms are available to deliver the
information with the required latency.
- The structure and the semantics of the information are known
and commonly represented on different architecture layers
(data persistence layer, application layer, service/message
layer, and process layer)
- The information service may be governed based on appropriate
processes, policies and organizational structures:
- The security of the information is guaranteed and incorporated into the solution rather than implemented as an afterthought and
follows security and privacy policies.
- The change of the service may be audited.
- The information service is easily discoverable by potential
consumers across the organization.
- A holistic governance approach is in place that addresses both
the service and the information layer.
Information as a Service is about leveraging information
architecture concepts and capabilities -- as defined through
Information On Demand -- in the context of SOA. There are important
capabilities and concepts in SOA that are not included in Information
On Demand and vice versa. But there is also a substantial overlap between them -- such as leveraging content, information integration, and
master data services -- which significantly improve the delivery of an
SOA project. The following diagram illustrates the alignment between
the SOA reference architecture shown on the left (see also Resources) and the Information On
Demand reference architecture on the right.
Figure 2. Information services in SOA
As part of the SOA design phase, architects may need to make
architecture decisions regarding which patterns to use based on the
requirements in the project. Table 1 describes some of the key,
but high-level, patterns that may apply.
Table 1.High-level categorization of information service patterns
| Name | Problem | Solution |
|---|
|
Data services
| How do I expose structured data as a service? | Implement a query to gather the relevant data in the desired
format and then expose it as a service. | |
Content services
| How do I best manage (possibly distributed and heterogeneous)
unstructured information so that a service consumer can access
the content effectively? | Provide a consistent service interface to content no matter where
it resides, maintaining the relationship between content and
master data. | |
Information integration services
| How do I provide a service consumer access to consistent and
integrated data that resides in heterogeneous sources? | Understand your legacy data and its quality, cleanse it, transform
it, and deliver it as a service. | |
Master data services
| How can consumers access consistent, complete, contextual and
accurate master data even though the data resides in heterogeneous
inconsistent systems? | Establish and maintain an authoritative source of master data as a
system of record for enterprise master data. | |
Analytic services
| How do I access analytic data out of raw heterogeneous structured
and unstructured data? | Consolidate, aggregate and summarize structured and unstructured
data and calculate analytic insight such as scores, trends, and
predictions. |
IBM Information Server plays an important role in the SOA design phase
by providing a unified metadata management platform. This platform
consists of a repository and a framework that allows various design
tools to access, maintain, and share their artifacts with other
IBM Information Server components and third party tools. The value of
this shared metadata platform is that metadata artifacts can be easily
shared between the tools and is kept consistent.
Conclusion
The purpose of this article is to give you an introduction to the information
perspective of SOA design and some of the key patterns -- the business glossary,
canonical models, data quality analysis, and information services. You should see
the role of leveraging industry models in those design activities. If any of these
topics has sparked your interest, be sure to read the coming articles in this series.
Resources Learn
Get products and technologies
Discuss
About the authors  | 
|  | Brian Byrne has over 10 years experience in the design and development of distributed systems, spending 7 years driving the architecture of Industry Models across a range of industries. Brian is currently an architect within IBMs Information Management organization. |
 | 
|  | David McCarty is based at IBM's European Business Solution Center in La Gaude, France and has 20 years experience designing and developing IT systems with IBM customers. He is currently a member of the Information as a Service Competency Center developing techniques and best practices for leveraging data systems in SOA solutions. |
 | 
|  | Guenter Sauter is an architect in the Information Platform & Solutions segment within IBM's software group. He is driving architectural patterns and usage scenarios across IBM's master data management and information platform technologies. Until recently, he was the head of an architect team developing the architecture approach, patterns and best practices for Information as a Service. He is the technical co-lead for IBM's SOA Scenario on Information as a Service. |
 | 
|  | Peter joined IBM three years ago after almost 25 years at institutions like the US Dept. of Defense, GE Corporate and Morgan Stanley where he held technical leadership positions and gained valuable experience in Enterprise Architecture and Enterprise Data Integration. He initially joined IBM as a Sr. IT Architect as part of the architect team for Information as a Service. Currently he is a Solutions Marketing Manager for the IPS Global Services organization, specializing in MDM solutions. |
 | |  | John Kling is an architect in the Information Services Practice within IBM's Global Business Services. He is responsible for leading large client engagements that focus on data quality, data integration and master data management. He is currently the data team lead for the SAP implementation of a Fortune 500 industrial company. |
Rate this page
|  |