Skip to main content

Achieve semantic interoperability in a SOA

Patterns and best practices

Mei Y. Selvage (meis@us.ibm.com), SOA Data Architect, Enterprise Integration Solutions, IBM
Mei Selvage
Mei Selvage is a SOA data architect with extensive hands-on experience in various information management areas and Service-Oriented Architecture (SOA). Her mission is to bridge the gap between SOA and information management. Her research interests include information management and integration patterns (both structured and unstructured data), data modeling, metadata, faceted search, human collaboration and SOA.
Dan Wolfson (dwolfson@us.ibm.com), CTO, Business Integration, IBM
Author photo
Dan Wolfson, IBM Distinguished Engineer, is the Chief Technical Officer for Business Integration Software in the IBM Software Group. He is responsible for architecture and technical strategy leadership for IBM's integration software (run time and tools), working with the architecture and development teams across the IBM Software Group; the IBM Websphere Business Integration and Information Integration products in particular. Dan has over 20 years of experience in research and commercial distributed computing ranging over transaction and object-oriented systems, programming languages, messaging, and database systems.
Bob Zurek (bzurek@us.ibm.com), Director, Advanced Technologies and Product Strategy Information Integration Solutions, IBM
Bob Zurek
Bob Zurek is Director of Advanced Technologies with IBM Information Integration Solutions. In this role he has the responsibility for driving and executing the technical strategy for information integration solutions as they relate to software, hardware, services, vertical industries, and emerging markets. Zurek is a frequent speaker on topics of middleware technologies and was VP of Product Management and Advanced Technologies at Ascential Software prior to the acquisition of Ascential by IBM.
Ed Kahan (ekahan@us.ibm.com), IBM Fellow, CTO, Enterprise Integration Solutions, IBM
Ed Kahan
Ed Kahan is an IBM Fellow and CTO of the IBM Software Group, Enterprise Integration. Ed is also Member of the IBM Academy of Technology. His current responsibilities include: strategy development, design and development of advanced technologies for WebServices, Service Oriented Architectures, and enterprise integration products, tools and solutions for IBM's clients. Ed is recognized inside and outside IBM as an expert in complex systems integration, architectural design and technology implementation.

Summary:  Semantic interoperability is often overlooked or an afterthought in the development of a SOA. Application and data architects may have difficulty making informed architectural decisions about it. This article unveils the mysteries of semantic interoperability in a SOA context. We will first walk through the semantic spectrum, and then discuss the anti-patterns, patterns and best practices of semantic interoperability.

Date:  23 Jun 2006
Level:  Intermediate
Activity:  2408 views

Introduction

Semantics concerns the study of meanings. Semantic interoperability indicates the meaning of data can be comprehended unambiguously by both humans and computer programs, and that information can be processed in a meaningful way. Semantic integration is the means to achieve semantic interoperability and can be considered as a subset of information integration, which includes data access, aggregation, correlation, and transformation.

In a Service-Oriented Architecture (SOA), semantic interoperability ensures that service consumers and providers exchange data in a consistent, flexible way that fulfills non-functional requirements (NFRs) such as performance and scalability, regardless of the diverse information involved. For example, a service requestor from a billing application needs the customer balance, which is called "BALANCE". Meanwhile, a service provider from an accounting application supplies customer balance, which is called "REMAINDER". Achieving semantic interoperability is to map "BALANCE" in the billing application to "REMAINDER" in the accounting application.

Semantic interoperability is an important architectural quality in a SOA because it enables service consumers and providers to exchange information that make sense, and which then can be acted upon. It is the foundation of a SOA. Without semantics, data is just strings of binary without any meaning. Without semantic interoperability, service consumers and providers could misinterpret and corrupt data, and ultimately bring undesirable effects to a SOA and the business.

In a broader sense, most information integration deals with semantic interoperability. The problem is that people take semantic interoperability for granted and seldom make conscious and informed architectural decisions on it because the semantic interpretation, mapping and transformation are so ingrained with home-grown applications, Enterprise Application Integration (EAI) and Enterprise Information Integration (EII). Therefore, it is commonly overlooked in the development of SOA.

The goal of this article is to give application and data architects awareness of semantics and semantic interoperability, and to enable them to make educated decisions when building new SOA-based solutions or migrating existing systems to SOA. To understand semantic interoperability, we have to first know various technologies and methodologies behind it, which are collectively called the semantic spectrum. Furthermore, anti-patterns caution us avoid the traps. Patterns and the best practices point us to the right direction. We will walk through semantic spectrum and discuss the patterns, anti-patterns and best practices of semantic interoperability.

Semantic spectrum

Semantic spectrum describes a series of technologies and methodologies for creating increasingly precise definitions for data. There is a balance between precision and vagueness -- more precise is not always better -- and many things need to be considered, such as time, cost and effort.

In order to define a data element, we need to care about both things themselves -- data instances -- and the definitions and descriptions of things -- metadata. Thus, semantic spectrum covers both data and metadata. It includes glossary, controlled vocabulary, data dictionary, data model, taxonomy and ontology per Wikipedia. For example, "data dictionary" and "data model" are related to the metadata; whereas glossary, controlled vocabulary and taxonomy concern the data instances. Ontology covers both sides. However, some people consider glossary and taxonomy as metadata as well. The clear distinction of data and metadata is outside the scope of this article.

A glossary is a list of terms with definitions. Many documentation and books have glossaries listed at the end to facilitate reading. Controlled vocabulary is a list of standardized terms conformed by groups and communities for specific purposes. It could be voluntary or mandatory. A list of regional codes is an example of controlled vocabulary. Glossary and controlled vocabulary are as old as human written languages and are often used as part of mass communication and linguistic infrastructure.

After data was digitized in the 20th century, relational databases were used as a dominant mechanism for data persistence. A data dictionary is used to capture and communicate the meaning and representation of various data elements, most commonly for relational databases. The data dictionary is an important artifact that enables meaningful communication between business and IT communities.

A data model describes the structures of data elements. Starting in the 1970's and long before the invention of Universal Modeling Language (UML), the relational database community has been using Entity-Relationship (ER) Diagrams to improve the communication and simplify the development effort.

Why has EDM not been popular in the past?

Up to this point in time, EDM has not been widely adopted by enterprises. At best, it has only been used in data warehousing. The following primary reasons explain why:

  • A typical marketplace has limited tooling support on logical data models, and some key features are missing, such as the round-trip synchronization of subsequent physical data models with original logical data model when changes occur. With the recent availability of IBM Rational® Data Architect, data architects and data modelers finally have a tool to create logical models, compare and synchronize logical and physical models, create glossaries and automate information integration in complex environments -- all within one tooling environment.
  • Software vendors have been specialized in either ER or UML. However, business analysts and data modelers are familiar with ER and are frequently reluctant to move to UML. Meanwhile, application designers and developers would much prefer to use UML class diagram. Lack of end-to-end tooling support from an EDM to specific model implementations in either ER or UML have created a major obstacle for the success of EDMs.
  • People have often tried to boil the ocean when building an EDM and are burdened by details. When an EDM project takes too long, stakeholders eventually lose interest. One of the best practices described below is "Think big; act incrementally towards strategic vision" is a good antidote for solving this problem.

To deal with increasingly complex and heterogonous database environments, people realized a need to create an Enterprise Data Model (EDM). Unlike the popular myth that EDM requires a mega-database that stores all the data that an organization deals with, it is only a common logical data model. It is typically in second- or third- normal form. Other logical or physical data models, such as ESB, applications and data warehousing, can all be mapped to this common logical data model. EDM is often used as a reference model for information integration or as a basis for persistent databases and data warehousing. For example, the enterprise model in the IBM Insurance Information Warehouse (IIW) is an implementation of EDM. EDM allows an enterprise view of data which helps to reduce data redundancy, improve data quality, and speed up integration and green-field projects. It can also ease the mapping from business requirements to data models.

An enterprise taxonomy is used to organize a set of standardized terms, concepts, categories and keywords. It is organized into a hierarchical structure to convey the parent-child relationship of terms and concepts and often associated with content management, knowledge management and search technologies.

Finally, according to the World Wide Web Consortium (W3C), ontology defines the terms used to describe and represent an area of knowledge. It specifies descriptions for classes (general things) in the domain of interest, the relationships among things and the properties (or attributes). The Standard Upper Ontology Working Group in IEEE is working to specify an upper ontology to support computer applications such as data interoperability, information search and retrieval, automated inference, and natural language processing.

Anti-patterns

Anti-pattern one: Uncontrolled semantic chaos

Semantic interoperability is very difficult to achieve for various reasons. The following are examples of why this happens:

  • We cannot simply rip and replace legacy assets. Whenever an organization creates a new application or integrate existing applications, integrating legacy assets are inevitable. Many enterprises still have COBOL or CICS running as their mission-critical applications. It is not an easy task to look through the COBOL Copybook to figure the semantics.
  • Business is constantly changing due to mergers, acquisition, regulation, market competition and customer demands. Information and application integration is never 100 percent complete.
  • It is inherently difficult for people to reach agreements due to differences in professions, experiences and knowledge-base. The bigger the circle of participants, the harder it is to reach agreements. EDM touches various business units and development teams. People often find it easier to build an application and a database in a silo, rather than reaching an agreement among broader teams.
  • Sometimes, people seemingly agree on something, but interpret it in different ways. Using industry standard data models as an example, many are vague and open to different semantic interpretation. People may diverge from the standard intended meaning and use them for different purposes, thus defending the purpose of industry standards.
  • Pervasive XML and XSDs that come out directly from developers without co-ordination merely contribute semantic chaos.

We may sound pessimistic at this point, but what we are really advocating is controlling semantic chaos. Semantic chaos means everybody defines their own schemas and vocabularies, does not follow any information standards, and does not consider semantic interoperability with the rest of the systems. People use their own terms and have difficulty on understanding each other. Systems are built in silos.

Anti-pattern two: Overly ambitious

The opposite extreme of uncontrolled semantic chaos is that it can be overly ambitious. The project involves lots of stakeholders, and among them it is difficult to reach agreement. The scope is over-stretching, and the timeline is too long. The EDM's past failure was largely caused by an ambitious plan to force applications and databases to conform to one data model.

Anti-pattern three: Lack of information integration logic reuse

There are three major information integration patterns to achieve semantic interoperability: data federation, data consolidation and Enterprise Application Integration (EAI).

Three major information integration patterns

  • The data federation pattern creates an integrated view into distributed information without creating data redundancy. It does this while federating both structured and unstructured information.
  • The data consolidation pattern frequently uses ETL tools to extract, transform, and load data from one or more data sources to one central location.
  • The Enterprise Application Integration (EAI) pattern uses application APIs to obtain data from various systems, transform (correlate, enhance, standardize and aggregate) data, and finally present one consolidated data stream to data consumers. These patterns often coexist to fulfill different sets of NFRs, such as data federation for the freshest data and data consolidation, for the fastest response time.

In reality, different groups within enterprises are good at using one pattern. For instance, data warehousing and business intelligence (BI) are generally adequate with the data consolidation pattern, whereas business application groups are good at EAI. Without cross-team coordination and enterprise-level, long term strategy, it is very easy to create a new set of integration silos -- business application groups reinvent the "wheel" on achieving semantic interoperability across heterogeneous data sources when BI group already solves this problem for data warehouse. As the result of a siloed approach, each group may have slightly different semantic integration and data processing logic.

Anti-pattern 4: Business terms and definitions chaos

Frequently, business users will make requests from information technology professionals to provide data to help them with their business analysis using business terms and definitions that may be quite different in different departments. For example, one department might define "customer" quite differently than another department. These business terms must in turn be translated by the IT professional into what are known as in the ER or UML. The definition of "customer" in the accounting system might be defined as a party to whom the organization has sold products; while in a marketing system it might be defined as a party to whom the organization wants to sell products. Determining which definition of customer to use is at the heart of this chaos.


Patterns of semantic interoperability

There are many patterns for achieving semantic interoperability in a SOA. They can be roughly classified by the following:

Pattern one: Point-to-point semantic integration

In this pattern, each data source has its own proprietary semantic meaning, and semantic translation is performed in a point-to-point manner. For example, when two data sources, A and B, need to be integrated, group A and group B print out their own ER diagrams, walk through the meaning of data elements, then perform direct mapping from data source A to B. Using the previous example, "BALANCE" column in a billing application is directly mapped to "REMAINDER" column in an accounting application. When the integrated data sources expand to four, six sets of mappings need to be performed. If data definition in one data source is changed, the impact to other systems is multiplied and often unpredictable. It does not matter how advanced the technology is that one picks, this semantic integration pattern is messy and a maintenance nightmare when data sources grow. Hence, it goes by the popular nickname, "hairball". Moreover, it does not easily lead to IT asset reuse. Believe it or not, many ESB and EII projects still perform point-to-point semantic integration in SOA. However, point-to-point integration is not necessarily a bad thing. It can be used selectively to ensure high performance and create a "fast path".

Pattern two: Hub-and-spoke semantic integration

Each system has its own proprietary semantic meaning, but is mapped to a logical data model which can be instantiated as a physical federated model or a canonical message model. Semantic interoperability is achieved within an enterprise via a hub-and-spoke topology, which reduces the redundancy and maintenance cost of point-to-point integration. Well-architected ESBs frequently use this pattern to map messages to a canonical message model and achieve semantic interoperability.

Pattern three: Master data management (MDM) pattern

MDM emerges as a pattern of semantic interoperability responding to data silos produced by departmental solutions. Today, many versions of truth exist in a typical enterprise information management system. A MDM system connects heterogeneous information sources and produces a single version of truth on key information such as customers or products for Online Transaction Processing (OLTP) and Operational Data Store (ODS) systems. The key information could be either a data instance, such as a particular customer, or metadata, such as specifications of products. A MDM system liberates data from individual business applications, package vendors and is based on open standards. As a result, data is truly treated and reused as a corporate asset. It is often built separately from existing systems to reduce the drastic impact to businesses, but legacy systems might eventually migrate to MDM systems overtime. MDM stands up as a distinct pattern from the previous two because MDM holds the single version of truth and effectively integrates various information systems from both logical and physical aspects. With MDM systems, companies gain many proven benefits, such as improved customer relationships, reduced time to introduce new products to market, data integrated with legacy systems and enabling asset reuse.

Pattern four: Industry information model

In order to encourage interoperability within an industry, vertical industry standardization groups develop industry-specific information models, which often include XML messages and message schema, also known as Domain Information Models (DIMs), although some groups produce relational data models as well. DIMs are typically XML-based and used to exchange messages in a business-to-business (B2B) environment. The members of industry standard groups agree to follow the specifications, and they are often required to certify for compliance. For instance, the Association of Retail Technology Standards (ARTS) is used for the retail industry, and the Agency Company Organization for Research and Development (ACORD) for the insurance industry. DIMs prompt a greater level of semantic interoperability, encourage asset reuse and level the playing field so members can spend less time, cost and energy to solve semantic interoperability issues. Some organizations even adopt the industry standard models as their internal enterprise logical models and canonical message models.

Standards organizations tend to look at information from horizontal and cross-industry perspectives. For example, the Open Applications Group (OAGi) is an open standards group building process-based XML standards for both B2B and Application-to-Application (A2A) integration, and it focuses on improving the state of application integration. Another example is RosettaNet, which helps companies from multiple industries meet the demands and challenges of today's global supply chain. Included are the RosettaNet Business Dictionaries and the RosettaNet Technical Dictionaries.

Pattern five: The Semantic Web

The Semantic Web cuts across the boundaries of applications, enterprises and industries. The Semantic Web links and relates elements of the data model to a common ontology. It uses the Resource Description Framework and the Web Ontology Language to allow data to be shared and reused on the Web.

Summary of patterns

We have reviewed different patterns to achieve semantic interoperability. The scope can include the business units, enterprises, and cross-enterprise within the same industry or cross-industry. The bigger the circle of semantic interoperability is, the more reusable the result will be in the long run, but also more difficult to coordinate and reach consensus. Moreover, an individual company will have less control to tailor the reusable assets to its unique business needs and determine the desirable features. For these very reasons, some companies would use internal proprietary data models for internal integration effort and use industry standard models for B2B.

Enterprises should take a balanced view to consider the trade-off of adopting different patterns of semantic interoperability. Some of the important questions to ask are: What will our business strategy be in five years? How can IT support our business vision? What is the business and regulatory environment of the IT group? How can IT cope with or even take advantage of change? Can IT leverage multiple initiatives to prompt greater asset reuse, minimize the maintenance cost and share the development cost? Which one of the information integration patterns is the best approach for a particular situation - data federation, data consolidation or EAI?


Best practices

Best practice one: Establish a Center of Competency on semantic interoperability

As stated previously in the "Semantic Uncontrolled Chaos" anti-pattern, people will inevitably have different takes on semantics. It is often more effective to foster collaboration among various stakeholders and document disagreements rather than force a consensus on an EDM or enterprise taxonomy. Many collaboration tools, such as Wikis, blogging and groupware are excellent tools to allow people to openly express opinions, resolve and document differences if a consensus cannot be reached. As a real-world example, the Semantic Interoperability Community of Practice (SICoP) is established by the Federal CIO Council to achieve semantic interoperability in the United States government. In a large agency, there are tens of thousands of databases, millions of data elements, and millions of documents. Reaching a consensus is often not practical, nor possible. SICoP helps bringing people together to collaborate more effectively.

Best practice two: Think big; act incrementally towards strategic vision

The key to successful semantic integration is to think big, yet act incrementally towards strategic vision. "Think big" means to create strategic vision and leverage various initiatives as much as possible, such as SOA, data warehousing, MDM and regulatory compliance. These initiatives frequently require the collaboration of multiple business units and an enterprise-level cultural change. A strategic, shared vision that clearly articulates the benefits to various stakeholders can win buy-ins that are absolutely indispensable. "Act incrementally towards strategic vision" means to create incremental plans to implement the vision, create data governance processes, deliver the tangible results iteratively, measure the progress and revise the execution plan on an on-going basis. To summarize, the strategic vision and good execution are both critical.

Whether enterprises perform top-down or bottom-up service analysis when they migrate to a SOA, either situation is an appropriate time to consider inclusion of an EDM and MDM if one has not started already. Both patterns improve semantic interoperability and data quality, guarantee the service-level agreements (SLAs) of data services, and reduce total cost of ownership (TCO). They can have a profound impact to fulfill the promise of a SOA - using IT to improve business agility, increase revenue and reduce TCO in the long run.

Best practice three: Reuse semantic integration assets

One of the major functions of an ESB is to transform messages from one data format to another to ensure service consumers and providers can communicate with each other. Transformation logic captured in an XSLT document can offer many re-use benefits. They can be re-used not only by other business processes, but also by ETL and applications. Likewise, semantic transformation assets used by ETL tools can also be exposed as Web services and become reusable and invokable by other applications.

Best practice four: Embrace and participate in industry standards

Many industry standards deal with semantic interoperability, which include data and data model standards for vertical and horizontal industries. Embracing and participating in industry standards allow individual companies to take advantage of industry-wide best practices and reduce the long-term cost of semantic interoperability.

An important criterion when selecting software vendors is to question if they support DIMs for the respective industries, how well they can seamlessly support the integration between the internal data models with DIMs, and how well they manage subsequent changes. To prompt broader semantic interoperability, IBM has been a strong supporter for various industry standards. For instance, IBM has donated more than 100 business-process models, model definitions and other industry content to ACORD in September, 2005.


Conclusion

The IT world is constantly changing, and as a result, semantic interoperability continues to be a moving target. Our first assumption is that change is inevitable. There is no Nirvana for businesses. Businesses always need to adopt to customer demands, trends, the economy, regulation and competition. Our second assumption is that we are in an information age, and the challenges of semantic interoperability will only increase. Unless we identify anti-patterns, patterns and best practices of semantic interoperability, our decisions are not effective to solve problems. Our third assumption is that the world is chaotic. The best we can do is to create or utilize various tools and methodologies to control the chaos within certain realms. Semantic interoperability really is just a form of order that we put around our world so we can control the chaos and make sense out of pervasive and ever-growing information. Controlling the chaos also means that we need to clearly understand our options, cost and benefits of each option and reduce the undesirable consequences of change to the minimum.

Hopefully, anti-patterns, patterns and best practices as discussed in this article will help you choose the appropriate way of achieving semantic interoperability for your SOA endeavors. To emphasize an important point, it is an illusion to believe one single pattern, group or standard can solve all the problems in semantic interoperability.


Resources

Learn

Get products and technologies

  • Build your next development project withIBM trial software, available for download directly from developerWorks.

About the authors

Mei Selvage

Mei Selvage is a SOA data architect with extensive hands-on experience in various information management areas and Service-Oriented Architecture (SOA). Her mission is to bridge the gap between SOA and information management. Her research interests include information management and integration patterns (both structured and unstructured data), data modeling, metadata, faceted search, human collaboration and SOA.

Author photo

Dan Wolfson, IBM Distinguished Engineer, is the Chief Technical Officer for Business Integration Software in the IBM Software Group. He is responsible for architecture and technical strategy leadership for IBM's integration software (run time and tools), working with the architecture and development teams across the IBM Software Group; the IBM Websphere Business Integration and Information Integration products in particular. Dan has over 20 years of experience in research and commercial distributed computing ranging over transaction and object-oriented systems, programming languages, messaging, and database systems.

Bob Zurek

Bob Zurek is Director of Advanced Technologies with IBM Information Integration Solutions. In this role he has the responsibility for driving and executing the technical strategy for information integration solutions as they relate to software, hardware, services, vertical industries, and emerging markets. Zurek is a frequent speaker on topics of middleware technologies and was VP of Product Management and Advanced Technologies at Ascential Software prior to the acquisition of Ascential by IBM.

Ed Kahan

Ed Kahan is an IBM Fellow and CTO of the IBM Software Group, Enterprise Integration. Ed is also Member of the IBM Academy of Technology. His current responsibilities include: strategy development, design and development of advanced technologies for WebServices, Service Oriented Architectures, and enterprise integration products, tools and solutions for IBM's clients. Ed is recognized inside and outside IBM as an expert in complex systems integration, architectural design and technology implementation.

Comments (Undergoing maintenance)



Trademarks  |  My developerWorks terms and conditions

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=SOA and Web services
ArticleID=133898
ArticleTitle=Achieve semantic interoperability in a SOA
publish-date=06232006
author1-email=meis@us.ibm.com
author1-email-cc=flanders@us.ibm.com
author2-email=dwolfson@us.ibm.com
author2-email-cc=flanders@us.ibm.com
author3-email=bzurek@us.ibm.com
author3-email-cc=bzurek@us.ibm.com
author4-email=ekahan@us.ibm.com
author4-email-cc=ekahan@us.ibm.com

My developerWorks community

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere).

My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Rate a product. Write a review.

Special offers