Skip to main content

Information management in Service-Oriented Architecture, Part 1: Discover the role of information management in SOA

Mei Selvage (meis@us.ibm.com), Data Architect, IBM Corporation
Mei Selvage is a SOA data architect with extensive hands-on experience in various information management areas and Service-Oriented Architecture (SOA). Her mission is to bridge the gap between SOA and information management. Her research interests include: information management and integration patterns (both structured and unstructured data), data modeling, metadata, faceted search, human collaboration and SOA.
Dan Wolfson (dwolfson@us.ibm.com), Distinguished Engineer and Chief Technology Officer in business integration, IBM
Dan Wolfson is a Distinguished Engineer and Chief Technology Officer in business integration in the IBM software group. With more than 18 years of experience in distributed computing, Dan's interests have ranged broadly across information integration, middleware integration, metadata, databases, messaging, and transaction systems. You can contact him at dwolfson@us.ibm.com
John Handy-Bosma (bosmaj@us.ibm.com), Senior IT Architect, IBM
John "Boz" Handy-Bosma received his Ph.D. in Communication from the University of Texas at Austin. He is currently a Senior IT Architect in IBM Global Services, Application Management Services. Boz is the portfolio lead for a variety of projects in Application Management Services, focusing on best practices in IT Architecture, mentoring of technical professionals, and emerging technologies in search and collaboration. You can contact Boz at bosmaj@us.ibm.com.

Summary:  Learn about information management, its importance to Service-Oriented Architecture (SOA), and the relationship between between information management and SOA. Then explore the challenges and benefits of reengineering information management into SOA. In this first part of a two-part paper, the authors break down information management into various services and provide a high-level overview of these services. The intended audience for this paper is architects, data modelers, database administrators, and developers who want to leverage the power of information management for SOA-based modeling, architecture, design, and implementation.

Date:  22 Mar 2005
Level:  Introductory
Activity:  3104 views

Introduction

Information management, which includes both data and content management, is an essential building block for Service-Oriented Architecture (SOA). Information management provides a means to represent, access, maintain, manage, analyze, and integrate data and content across heterogeneous information sources (see The On Demand Operating Environment Architectural Overview in the Resources section). There are many diverse functions in information management, including the following:

  • Extract Transformation Load (ETL)
  • Federation
  • Data placement (such as replication and caching)
  • Data modeling
  • Search
  • Analytics

These functions can be grouped as composable components and provided as reusable and callable Web services. It is important to examine how these functions help in building a SOA and how they interact within the SOA context. Without a clear understanding of their intent and value propositions, you could easily lose the big picture and make the wrong choices in information management and architecture. A proactive plan to incorporate information management into the larger, holistic SOA picture helps you to fill common gaps such as data solos (isolated data sources), data inconsistencies, and untapped information assets.


SOA is more than Web services

Information management services deal with the information component of SOA. However, when people think about the SOA concept, they most likely think first of Web services. Few people look beyond the Web services programming model to consider the underlying aspects of information management. Yet information management supports SOA in that it deals with one of the most important corporate assets -- information in both structured and unstructured formats. Information architecture, a key part of information management, makes SOA more intelligent and manageable (see The Role of EII in SOA in the Resources section). Ultimately, without a solid and robust information management environment, SOA is limited and presents fewer opportunities for end-to-end business integration and transformation.

Information management in SOA places a strong emphasis on Enterprise Information Integration (EII) -- technology that integrates structured and unstructured information sources so that they can be dealt with as if they were a single source. Structured information typically includes relational, XML, or tabular data, such as spreadsheets. Traditionally, management of structured information falls under the label of data management. Unstructured information, in contrast, involves free text reports, documents, Web pages, life science data, audio, video, and so on. The management of unstructured information is often categorized as content management.

EII can provide a unified view of data and content, simplifying representation of and access to underlying information services. Information management in SOA also extends beyond EII to include ETL, data placement, data retrieval, and similar functions. Part of the process of creating sound information architecture and design is to understand the trade-offs among different approaches, so that you can apply the right technologies to customer problems.


How information management enables SOA

Information management in SOA, particularly EII, emphasizes separation of concerns between the services layer and the physical implementation of the data. Separation is often provided by middleware like IBM® WebSphere® Information Integrator (formerly known as DB2® Information Integrator). Such middleware can greatly reduce the total cost of ownership and the complexity of the information integration. Appropriate use of EII can allow for integrated views of underlying, potentially heterogeneous data that are easy for services to work with, helping to insulate the services layer from changes in the physical data. The insulation layer is extremely important to SOA because it allows transparency of database vendor products, OS platforms, information location, data format, and a physical data model.

To achieve the loosely-coupled view of information sources from applications, information management in SOA accesses and aggregates heterogeneous data and content sources (this capability is known as federation) so they appear to the user as if they were one single database or content source. Since information management in SOA acts as a middleware layer between the applications and data sources, the programming logic for data connectivity, data transformation rules, and data mapping are centralized, and many applications (service consumers) can reuse them. Moreover, information management in SOA offers great extensibility, allowing applications and users to access information not only within the enterprise, but also across enterprise and industry boundaries. This complete end-to-end horizontal business and information integration gives business agility and flexibility, thus paving the road toward becoming an on demand business. Lastly, information management in SOA is based on information standards ranging from data, content, and metadata to metamodel and meta-metamodel (which we'll discuss at length later in this paper) such as Unicode, XML, and Metadata Object Facility (MOF).

As a Web services provider, IBM DB2 UDB creates an environment that is friendly to SOA development and deployment. For example, the Web Services Object Runtime Framework (WORF), which ships with DB2 UDB for Linux, UNIX, Windows®, and z/OS, provides an environment in which to easily create simple Web services that access DB2 databases. In simple terms, access to DB2 data can be defined using an XML file that contains a series of operations. The operations can be either SQL operations (select, insert, update, delete operations, or call to stored procedures), or XML collection operations (generate or store XML documents).

As a Web service consumer, IBM DB2 Web service consumer user-defined functions (UDFs) enables database applications to invoke Web services directly using SQL statements. You can use WebSphere Studio Application Developer to convert existing WSDL interfaces into a DB2 table or scalar UDFs easily (see XML for DB2 Information Integration in the Resources section).


Reengineer information management into SOA

We have described how information management enables SOA. Now we take a look at how information management can also benefit from SOA principles.

Challenges

Despite the rise of information standards such as XML, Unicode, and UML, many data sources use proprietary data formats, metadata, and metamodels because of historical reasons or out of habit. Integrating different data sources has historically required enormous effort, and is typically accomplished by building point-to-point data and application integration. To illustrate the severity of the problem in the industry, there are over 250 vendors in the industry today that provide ETL tools for different kinds of data sources on the input end, and analytical tools on the output end (see "Java Metadata Interface (JMI) Specification" in the Resources section). ETL is often used to extract data from the source system, transform data into a compatible format with the target system, and then load into a target system, such as a data warehouse or data mart. Given that ETL is just a small part of EII, you can imagine the scope and the severity of the point-to-point integration problem.

On the content side, the challenges are equally daunting. Content management solutions come from different historical lineages, and most of them are vertical and departmental based, for example document management for the legal department, knowledge management for IT department, or Web content management for the marketing department. In today's content management market, these solutions are often provided using different products from different vendors. Even for single vendors, functionalities among products frequently overlap.

The lines that separate various solution types have become increasingly blurred over time. For instance, current business intelligence demands real-time data to derive a complete understanding of the market place, which pushes ETL vendors to expand real-time data capabilities. On the other hand, data federation increasingly requires data transformation capabilities to improve data quality and flexibility. We are seeing a trend toward convergence in various aspects -- some examples are data and content integration convergence (particularly in the context of XML), ETL and federation convergence, and knowledge management and Web content management convergence.

When an enterprise wants to make these kinds of changes, it faces significant challenges. Following are some of the items the enterprise needs to consider:

  • Leaving the vertical and departmental view
  • Transforming existing information management functions to reusable services
  • Integrating large numbers of heterogeneous information sources
  • Reducing development costs
  • Expanding capabilities

The challenges cannot be easily solved because vendors want to protect their current IT assets and customer base. Advocates within the enterprise also must sell the SOA vision internally and move toward a standards-based direction. From a user perspective, instilling IT asset reuse as part of an organizational culture is a very difficult task.

Benefits

The benefits for adopters of a SOA for information management are signficant. Among other things, SOA-based information management does the following:

  • Allows for systematic IT asset reuse. Data modeling, mapping, and transformation are the most complex and labor-intensive processes. Current point-to-point information integration does not easily lead to IT asset reuse.
  • Speeds development time and reduces development and maintenance costs.
  • Increases data and content connectivity and interoperability with greater cost efficiency.
  • Creates additional business insights based on fully integrated information. For instance, Gartner predicts that the ability to analyze unstructured content will lead to new business opportunities.
  • Protects customer investment in a highly volatile information management market in which mergers and acquisitions frequently take place.
  • Simplifies the overall complexity of the enterprise computing model.

Services that information management offers

Having stated how information management enables SOA, as well as the challenges and benefits of using SOA to reengineer information management, let's examine services offered by information management, such as Extract Transformation Load (ETL), federation, modeling, search, and analytics.

The following list illustrates an information management stack, which is a logic view or a framework of categorizing services offered by information management based on their value propositions: security, collaboration, availability, manageability and information consumption:

  • Security: This is the entry point for applications to access heterogeneous data sources based on who-can-see-what policies.
  • Collaboration: This is indispensable in a team environment, so you need work flow and version control.
  • Qualities of Service (QoS): This can include goals for the information availability, performance, data throughput, and consistency or accuracy of the data. Federation, ETL, caching, replication, and event publishing are all aimed to meet QoS goals.
  • Manageability: Since information is the means to store the intelligence and complexity of an organization, (structural and semantic) modeling, (data) profiling, and (data and content) quality disciples are utilized to make information more manageable.
  • Consumption: Additionally, the whole purpose of previous work is to have some actors (including machines) to use it; thus information consumption sits on the top of the stack.

No single product offers all of these services. Sample tools are listed at the end of this paper. As a whole, these services create a complete information management framework under SOA. Ideally, each service deserves a paper in its own right, but we only provide an overview here.


Figure 1: Information management in SOA
Information management in SOA

Information management in SOA takes a holistic view of information assets within and across the organizations. Although diverse techniques can be used for different purposes, information management does not arbitrarily divide information into structured or unstructured worlds; nor does it compartmentalize solutions into departmental views. The key differentiator of information management in SOA from earlier, more rigid data and content management approaches is that it provides services to whomever requests it at the right time, right place, and for the right reasons.

As we stated earlier, services listed in the information management stack are typically provided through middleware. Enterprises can opt to build these services into their applications from scratch, though the cost and time-to-deploy are often prohibitive. The best practice is to understand business requirements, choose a vendor who offers seamless information integration and the most complete information management solution, and build a handful of selective services to compensate for the missing pieces, or even outsource certain complex services to a third-party information service provider, as we will illustrate in the case study in the next part of this paper.


Conclusion

You have seen how information management works in SOA context and illustrated the relationship between them. You also examined the challenges and benefits of reengineering information management into SOA.

This first half of a two-part paper on information management in SOA provided a high-level overview of these services. In the second half of this paper, the authors will describe the details of each service and present a real-life user scenario.

Acknowledgement

The authors wish to give special thanks to Susan Malaika, Norbert Bieberstein, and June Gu for their excellent feedback. Appreciation is also given to Robert D. Johnson and Elizabeth Wallace for their support.


Resources

About the authors

Mei Selvage is a SOA data architect with extensive hands-on experience in various information management areas and Service-Oriented Architecture (SOA). Her mission is to bridge the gap between SOA and information management. Her research interests include: information management and integration patterns (both structured and unstructured data), data modeling, metadata, faceted search, human collaboration and SOA.

Dan Wolfson is a Distinguished Engineer and Chief Technology Officer in business integration in the IBM software group. With more than 18 years of experience in distributed computing, Dan's interests have ranged broadly across information integration, middleware integration, metadata, databases, messaging, and transaction systems. You can contact him at dwolfson@us.ibm.com

John "Boz" Handy-Bosma received his Ph.D. in Communication from the University of Texas at Austin. He is currently a Senior IT Architect in IBM Global Services, Application Management Services. Boz is the portfolio lead for a variety of projects in Application Management Services, focusing on best practices in IT Architecture, mentoring of technical professionals, and emerging technologies in search and collaboration. You can contact Boz at bosmaj@us.ibm.com.

Comments (Undergoing maintenance)



Trademarks  |  My developerWorks terms and conditions

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=SOA and Web services, XML
ArticleID=56687
ArticleTitle=Information management in Service-Oriented Architecture, Part 1: Discover the role of information management in SOA
publish-date=03222005
author1-email=meis@us.ibm.com
author1-email-cc=
author2-email=dwolfson@us.ibm.com
author2-email-cc=
author3-email=bosmaj@us.ibm.com
author3-email-cc=

My developerWorks community

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere).

My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Rate a product. Write a review.

Special offers