The trend toward increasing simplification is driving enterprises to extend SOA into the data layer by providing information services. Information services enable the necessary linkage and binding between processes and information in an enterprise when deploying composite SOA applications (see The Emerging Vision for Data Services: Becoming Information-Centric in an SOA World). But how do you build information services to maximize the flexibility and reuse of these services? This article describes an enterprise information strategy and architectural framework to maximize the value and accessibility of information in an enterprise, and to help achieve the enterprise vision of becoming information-centric in an SOA world.
Data architecture and an understanding of the sources, relationships, and meanings of information across various systems is the focus of information services architecture. The goal is to build reusable services that application developers can find and use, without needing to go through any analysis processes in order to understand the sources and their complex relationships. The key elements of the architecture proposed in this article are an information model and metadata.
The information model provides a common basis for understanding data, and reflects the agreed-upon business view, business vocabulary and business rules. Enterprises typically maintain, and have difficulty managing, a large number of physically different data formats. Transforming the enterprise to a common data format is difficult and impractical. The proposed information model does not reflect any specific data model in the enterprise. Instead, it helps efficiently manage enterprise data by establishing a common understanding of the data using semantics, which map the various (often inconsistent) data models to the information model. In the process, semantics help describe the underlying business meaning of the enterprise data in a unified manner. Semantics use metadata to capture the formal meaning of data. Data semantics are established by rationalizing the underlying enterprise data -- a process that relates physical data schemas to the concepts defined in the agreed-upon enterprise business information model. Semantic services are represented in the semantic/logical services layer of the information access and distribution architecture shown in Figure 1.
The metadata-driven nature of the proposed architecture also allows for a great deal of flexibility. The architecture introduces a semantic/logical information services layer that uses metadata to enable the loose coupling required to deploy on-demand applications. Semantic reconciliation and metadata enable enterprises to eliminate the hard-wiring between applications and physical data repositories. This hard-wiring is the major cause of rigidity and inflexibility in enterprise architectures.
In Part 2 of this series, we'll describe additional business scenarios and examples that map to the architecture and illustrate the concepts described in this article. We'll also include an implementation roadmap using IBM products and tools.
Information challenges in an enterprise
The majority of enterprises are still delivering data and not information, and as a result, their information systems in production suffer many drawbacks. These enterprise information systems are expensive to develop, extend, and maintain. They contain enormous amounts of redundant data, and often present out-of-date information. In most cases, the systems provide only partial answers to users’ questions, make it difficult for users to identify relevant information, and typically require IT assistance for user access.
A typical enterprise has an abundance of monolithic, mission-specific, silo systems. The majority of enterprise applications in production are monolithic, and their business logic is not externally accessible in a modular form that allows easy reuse in other applications. Because their systems are not integrated, most enterprises lack the ability to effectively aggregate data, and therefore, must implement lots of costly custom code to satisfy business requirements. As a result, enterprises often end up missing major business opportunities. In addition to integrating enterprise systems, enterprises need to be able to provide a single view of information. For example, developing an enterprise view of the customer is one of the primary requirements of an enterprise business strategy.
The enterprise data problem has a strong and measurable impact on a company's bottom line. The problems typically manifest themselves in the following pain points:
- Information quality
- Business agility
- IT costs
Information quality: The non-integrated and fragmented data environment breeds redundant and inconsistent data that leads to information quality problems. These problems manifest themselves as mishandled customer relationships and internal operations. Information quality issues cost businesses millions of dollars annually.
Business agility: Flexibility is critical to a modern enterprise that must respond to constantly changing business requirements. The data problem creates a rigid environment that lacks reuse and prevents agility.
IT costs: The inflexibility in the non-integrated enterprise causes an exponential increase in time and cost to implement new business requirements. In such an environment, it's costly to administer and maintain the redundant databases across the enterprise, which involves multiple point-to-point mappings and translations.
Most enterprises undergo extensive data replication as part of their daily operations. This creates widespread redundancies that increase the complexity and inflexibility of the enterprise architecture, and makes it harder to implement enterprise-wide SOA. Replication creates redundant and inconsistent data sources across the enterprise, and causes the creation of redundant and inconsistent business logic throughout the enterprise.
The information on-demand vision
Information is the key to business innovation. The information on-demand vision aims to leverage information for business optimization by delivering trusted information real-time and in context. It also aims to reduce risk and improve visibility into business operations.
Business drivers for an on-demand enterprise are driving the shift to loosely coupled, dynamically configurable applications. On-demand enterprises then adopt a strategy to build these loosely coupled, dynamically configured applications, instead of tightly coupled, hard-wired applications. On-demand enterprises aim to reconfigure and assemble services in an on-demand or just-in-time manner from disparate application components.
In order to achieve the information on-demand vision, you must create loosely coupled information services that can be reused for different business contexts. The following section describes a proposed architecture to achieve this loose coupling. It's important to note that an information entity can differ in composition and usage from one business context to the other. To illustrate this, consider the following example from the IBM enterprise.
At IBM, the client organization information is used in a variety of business contexts and business processes including sales and financing. The business context for a direct sale is different than the business context of credit and financing where IBM’s credit facility (IBM Global Finance) is trying to finance the customer’s purchases. Different client organization information is needed for the “financing” business context than for the “direct-sell” business context. For sales, the sales region and sales representative are the relevant pieces of information, but for financing, information about revenue and credit rating is important. This information is typically different than the core customer data (organization name and address, geography, region, and so on.) The core organization data is also composed of legal information, independent of IBM, like name and address, and IBM-specific classifications like IBM Geography and IBM Region. As you can see, in the financing business context the client organization entity has a different meaning and composition than in the sales business context, and all this data is likely stored in different enterprise databases.
An enterprise information on-demand architecture
Figure 1 shows an information access and distribution architecture for an on-demand enterprise. This architecture employs information services with abstraction layers that maximize the loose coupling between applications, or enterprise processes, and enterprise data. By maximizing the loose coupling between processes and data, we maximize the reusability of information services. We'll discuss this in detail in this article, and show how you can enable a flexible and responsive on-demand enterprise by implementing information services that follow the proposed architectural abstractions.
Figure 1. SOA information access and distribution architecture for an on-demand enterprise
In the proposed architecture, the information sources (the data repositories at the bottom of Figure 1) in the enterprise are no longer tightly bound or limited to any specific enterprise application. The data manipulation services (in the lower layer) extract data from the physical repositories, and semantic services turn this data into information. That is, semantic services translate the physical data into a logical representation of the data that conforms to the agreed-upon business view of the data (the information model). In this manner, information services enable flexibility during the composition process by configuring (or exploiting) the underlying data repositories for the business context at hand.
Remember that the same information is composed and used differently in different business contexts. For example, customer information used within a direct sales business context most likely includes different customer information entities than the ones used in the customer financing business context. So, if customer information services are created (at the various levels of abstraction shown in the architecture) for the direct sales business context, then when composing a new solution for the customer financing business context, some of the customer information services can be reused in the new solution composition (the new business context), while other new customer information may need to be created. If the customer information service is built in a monolithic and hard-wired fashion to satisfy only one business context, without considering the specification of the underlying reusable services at the different layers of abstraction, it will be impossible to provide a flexible architecture that can reuse the underlying services to compose a solution for the new business context.
The rest of this article describes the proposed architecture and how to build it step-by-step.
Leveraging enterprise information with SOA and information services
SOA fosters the ability to easily connect and reuse information and software assets. SOA has enabled IT management to develop a new approach to turn data into a much more powerful corporate asset. Information is a vital component of the SOA strategy. Information availability is the most anticipated benefit of business investment in IT. According to Gartner Research, "You will waste your investment in SOA unless you have enterprise information that SOA can exploit." ("Service-Oriented Business Applications Require EIM Strategy," Gartner Research, March 2005)
Information services are integral to the SOA strategy, which centralizes and standardizes the approach to data integration for processes and applications. Information services integrate and manage diverse data and content from various information sources in a unified manner. Information services package and make available information as a service to all business processes in a standard, consistent, and manageable way. Information services provide a virtual enterprise information environment in which the users of information are shielded from the complexities of the underlying information systems and repositories. This environment provides a single point of control for access to information systems through which people, processes, and applications can access reliable information without knowing how or where it is stored. In this manner, information systems and repositories can evolve without impacting users (people, processes, and applications).
Information services should be designed to loosen the tight connections between data and applications so that data can be controlled and shared across the enterprise. To summarize, information services provide:
- Consistent definition and packaging of data from process to process
- Consistent rules applied to the data
- Improved data quality
- Centralized control and maintenance
They can also leverage metadata relationships to ensure that processes understand where information came from.
Building the information services architecture
In this section, we'll show you step-by-step how to build the architecture shown in Figure 1, transforming an enterprise from custom access to information, to providing information as a service. In the process we'll add the information services abstraction layers, and the service virtualization and connectivity.
From custom access of information to information as a service
Figure 2 shows a direct coupling approach, in which business applications create custom access to information. In this approach, each business application (or process within a business application) creates custom access to information, resulting in inconsistent views of data across business applications. For example, one process in an application gets account data from different information sources than a process in a different application. The direct coupling approach also results in inconsistent application of rules (or business logic), in which calculations are done differently from process to process. This leads to multiple points of maintenance for the same logic, which increases the complexity and cost.
Figure 2. Custom information access approach
Figure 3 shows the SOA information as a service approach. SOA centralizes and standardizes the approach to data integration for processes, packaging information as a service to business processes, so that consistent, manageable information is made available to every process in a standardized way. In this approach, data is consistently defined and packaged from process to process, consistent rules are applied to the data, data quality is improved, and control and maintenance is centralized. Information as a service loosens the tight connections between data and applications so that data can be controlled and shared across the enterprise.
Figure 3. Information as a service approach
At the bottom Figure 3, you can see enterprise information sources containing information that needs to be available to any process or person that can benefit from it. These information sources and repositories are typical of the multitude of application silos you find in many enterprises, such as a SAP application running on DB2, or any custom enterprise application running on other repositories. The information sources can also include disparate sources from external suppliers and business partners. When enterprises deliver information as a service based on a flexible information architecture, as shown in Figure 1, enterprise architects and application developers can use these information services directly when building new tools or applications, without having to understand what sources the information is coming from. Information services integrate information regardless of its location to provide a unified view, and add business context and value to the raw data. Information as a service virtualizes access to information by separating it from the processes and applications, which makes it easier and faster to change both. The flexibility provided by information as a service in delivering information is critical to meeting the goals of any on-demand business.
See Appendix for a comparison of the direct coupling approach with the SOA approach relative to the following decision factors: flexibility, data governance, ease of development, and performance and scalability.
Adding service virtualization and connectivity
First we'll enhance the information access and distribution architecture by adding the connectivity and interoperability layer shown in Figure 4. The connectivity and interoperability layer represents the Enterprise Service Bus (ESB) architectural construct for SOA. You can think of it more simply as a layer that provides asynchronous and synchronous communications across the enterprise, and uses techniques such as messaging, method calls, service integration and FTP. This layer virtualizes the services by decoupling the point-to-point connections from the interfaces themselves. The service interfaces are put into a third-party broker, which helps you manage the interfaces better. This enables faster and more flexible coupling and decoupling of applications. Because you can find all of the applications and the interfaces, you can then reuse both.
Figure 4. Adding the connectivity and interoperability layer
Adding the information services architectural layers
The architecture identifies three types of information services: business information services, semantic/logical services, and data manipulation services. These services define the architectural layers of information services in Figure 5. The layers are needed to maximize the loose coupling between applications and data in order to create reusable services for a flexible on-demand enterprise.
Figure 5. Adding the information service layers
The data manipulation services layer
Data manipulation services are processing logic that directly manipulates data values and the representation of those values for storage, transport or presentation purposes. These services exist at a lower level of detail, and physically manipulate data for persistence, access, and delivery. They provide the mechanism to physically connect to data, and answer the question of "How do I get my data?" Data manipulation services consist of the following types of services:
- Data persistence services: These services perform data activities that are traditionally relegated to the database management system, such as create, read, update and delete.
- Data access services: These services are responsible for authorization, restrictions and logging access to data. Data access services provide or prevent access to data based on authorizations and restrictions (policies and rules for access).
- Data event monitoring services: These services can monitor, capture, and deliver data changes as they are made in the data sources. Data event monitoring involves familiar database technologies such as database triggers, data replication and database recovery log processing. Data event monitoring services can be built on top of data event monitoring tools, such as WebSphere® Data Event Publisher. Event monitoring services are essential elements in implementing asynchronous information distribution feeds to LOB clients. Changes to source tables, or events, are captured from the log and converted to messages in an XML format. This process provides a push data integration model that is ideally suited to data-driven Enterprise Application Integration (EAI) scenarios and change-only updating for business intelligence and Master Data Management (MDM).
Business information services layer
Information is data with context. Business information services present the information in a "business context" and represent the common business view of the data as defined in the enterprise information model. Business information services use a collection of underlying services that:
- Develop a standard way for defining the data needed and its location.
- Understand the relationship between sources and how that relates to the business terms that describe how the business sees the data.
- Determine where the relevant data is stored across different applications and databases
The introduction of business context in this layer enables the delivery of dynamic solution behavior through the reuse (and dynamic services assembly) of the underlying information services. The context information establishes an execution environment that affects the output of a service for the equivalent input. This adds flexibility (without adding costly development efforts) by enabling multiple channels (clients) to leverage reuse at run-time.
The example we described earlier of customer information used in a "direct sales" versus a "financing" business context illustrates how the same information can be composed and used differently under different business contexts.
Semantic/logical services layer
Semantic services enable you to represent semantic models, identify model-to-model relationships, and execute the necessary translations to reconcile data with differing semantic models. This layer contains the business rules and mechanisms for ensuring that the data has consistent meaning (for example, an item labeled X in model 1 is the same as an item labeled Y in model 2). It contains the representations of the actions required to preserve consistent meaning across different data structures and repositories.
This layer represents and uses the metadata that is key to the successful decoupling of each layer from its neighboring service layer. Through semantic reconciliation and metadata, you can eliminate the hard-wiring between applications and physical data repositories. This enables the loose coupling required to deploy on-demand applications. You can achieve greater flexibility, adaptability, and independence when each layer is only loosely coupled to the layer above and below. The metadata-driven nature of the architecture also enables a great deal of flexibility.
An example: different views of a client organization
This section walks through an example that illustrates the principles of the proposed architecture when applied to maintaining client organization information at IBM.
An enterprise-level IBM client information application captures and stores information about legal organizations that constitute clients of IBM. The information is stored in a number of databases, and provides as much information about these organizations as possible, including data related to the fact that the organization does business with IBM, as well as legal information describing the organization independent of its IBM connection.
Data manipulation layer
The physical data maintained in various IBM databases may contain different information, including:
Enterprise organization database:
- Organization name
- Address of organization headquarters (street address, city, country)
Enterprise administrative database:
- Geography – country mapping
- Region – country mapping
- Information about IBM sales coverage of the organization
- Information about the sales representative assigned to the organization
- Organization revenue per year
- Organization credit rating
The services that create, access and maintain these data groupings constitute the data manipulation layer of the described architecture. They define and use the data manipulation layer metadata, such as various database locations, schemas, rules, connections, user IDs and passwords, and so on.
Semantic/logical services layer
The Semantic/logical services layer describes a general purpose business model that defines different kinds of business entities. The business information model represents the business view of the data and is not specific to the semantic/logical layer. The services within the semantic/logical layer translate from the various different representations in the underlying data sources to the common representation of the business information model.
For the company as a whole, there are some universally accepted parts of the definition of an organization. An organization is named with an OrgName, and has an OrgLocation, which is composed of a headquarters address entity (HQAddress, defined and stored once for the whole enterprise and composed of several properties ranging from HQStreetAddr fo HQCountry). There are also legal, non IBM-specific attributes of an organization.
The location also has the properties OrgGeography and OrgRegion, which are IBM-specific attributes defined enterprise-wide by a separate administrative department, based on a country.
Figure 6 shows the components of the organization model:
Figure 6. Organization model
Using the "triple notation" of the Resource Definition Framework, (see Resources), this business information model metadata can be also represented as follows:
Listing 1. Organization model metadata
Enterprise:Organization rdf:type rdf:Class Enterprise:OrgName rdf:type rdf:Property Enterprise:OrgName rdfs:domain Enterprise:Organization Enterprise:OrgLocation rdf:type rdf:Class Enterprise:OrgLocation rdfs:domain Enterprise:Organization Enterprise:HQAddress rdf:type rdf:Class Enterprise:HQAddress rdfs:domain Enterprise:OrgLocation Enterprise:OrgGeography rdf:type rdf:Property Enterprise:OrgGeography rdfs:domain Enterprise:OrgLocation Enterprise:OrgRegion rdf:type rdf:Property Enterprise:OrgRegion rdfs:domain Enterprise:OrgLocation Enterprise:HQStreetAddr rdf:type rdf:Property Enterprise:HQStreetAddr rdfs:domain Enterprise:HQAddress …. Enterprise:HQCountry rdf:type rdf:Property Enterprise:HQCountry rdfs:domain Enterprise:HQAddress
Note: The prefix
rdf defines RDF
elements, whereas the prefix
rdfs defines RDF
For its own purposes, the sales department maintains another collection of IBM-specific information about organizations, as shown in Figure 7:
Figure 7. Sales information model
Listing 2. Sales information model metadata
Sales:SalesInfo rdf:type rdf:Class Sales:SalesRegion rdf:type rdf:Property Sales:SalesRegion rdfs:domain Sales:SalesInfo Sales:SalesRep rdf:type rdf:Property Sales:SalesRep rdfs:domain Sales:SalesInfo
The financial department, is interested in another set of organization properties, as shown in Figure 8:
Figure 8. Finance information model
Listing 3. Finance information model metadata
Finance:FinanceInfo rdf:type rdf:Class Finance:Revenue rdf:type rdf:Property Finance:Revenue rdfs:domain Finance:FinanceInfo Finance:CreditRating rdf:type rdf:Property Finance:CreditRating rdfs:domain Finance:FinanceInfo
Finally, in the context of the entire business, the entities of Organization_Sales and Organization_Finance are defined, as shown in Figure 9:
Figure 9. Business-level information model
Listing 4. Business-level information model metadata
Business:Organization_Sales rdfs:subClassOf Enterprise:Organization Sales:SalesInfo rdfs:domain Business:Organization_Sales Business:Organization_Finance rdfs:subClassOf Enterprise:Organization Finance:FinanceInfo rdfs:domain Business:Organization_Finance
The enterprise, sales, finance and business domains (prefixes in the RDF notation) represent contexts in which the business information can be defined and interpreted. In the XML representation of RDF, they correspond to XML namespaces.
Figure 10 illustrates how business information could be delivered using the proposed information architecture with SOA. Let’s assume a request for organization data comes from the sales department. It is received by the Org information service, which detects the request’s business context (sales) and invokes the sales context service.
The sales context service knows the business-level semantic definition of Organization_Sales (the business information metadata). First, it invokes the enterprise (core) semantic service to retrieve the enterprise-level organization information.
The enterprise semantic service uses the semantic/logical metadata. It uses enterprise data service to retrieve individual parts of the organization’s address, and creates an OrgAddress from them. It also retrieves OrgGeography and OrgRegion based on the HQCountry component of OrgAddress, again using the enterprise data service, and creates OrgLocations from OrgAddresses, OrgGeographies and OrgRegions.
The sales context service then uses the sales data service to retrieve the SalesInfo, and puts them together to return the Organization_Sales entity to the requester.
Finance, enterprise, and sales data services reside at the data manipulation layer, and are configured to use the data manipulation layer metadata, which includes inormation on database locations, schemas, connections, rules, and so on.
One of the most important features of the proposed architecture is the reusability of services at different levels. Notice that in order to be able to retrieve an Organization_Finance entity, enterprise semantic and data services may be reused, and only finance services need to be introduced.
Also, the context-dependent metadata supports building other business-level services out of building blocks, such as business, sales and finance contexts. For example, a marketing department may view an organization's data as composed, in addition to its location, of sales information, finance information, plus some marketing-specific information. This can be achieved by creating a marketing context that uses business, sales and finance contexts. When building business-level information services for the marketing context, you could reuse all the services in this example.
Figure 10. Example applied to architectural layers
In this article, we described an architecture framework for accessing enterprise information. The architecture separates information access functionality into layers for (1) manipulating physical data, (2) defining semantics/logics of business terms and (3) defining enterprise-level business concepts. Functions at all layers are encapsulated as SOA services, making it possible to compose enterprise-level information services out of widely accessible, flexible and reusable building blocks. The architecture brings us closer to the vision of information on demand, in which business data can be accessed, interpreted and composed across the enterprise without being hidden within individual business applications. We walked through an example application of the proposed architecture to access business client organization data at different levels and in different business contexts.
In Part 2 of this article, we'll describe additional business scenarios and examples that map to the architecture and illustrate the concepts described in this article. Part 2 will also provide an implementation roadmap using IBM products and tools.
Appendix. Advantages of the SOA architecture over direct coupling
This section compares the direct coupling approach with the SOA approach relative to the following decision factors: flexibility, data governance, ease of development, and performance and scalability.
Flexibility: Here we are concerned with loose coupling to data model and data store, and reuse. With the SOA approach, a service encapsulates how it derives the data, whether by retrieval from a database or by computation. With this approach, solution architects are better able to separate data access logic from business logic, and can promote reuse with careful design, using the architectural layers of information services described in this article.
With the direct coupling approach, the use of database views can provide a level of decoupling from actual table format. However, application logic still needs to encode the details of SQL and access protocol. The data access logic is interwoven with application logic, which makes it hard to reuse the data access logic, as well as to keep track of it, and to modify it.
Data governance: With the SOA approach, the service enablement middleware provides control points and built-in capabilities to audit and control access to services. However, with the direct coupling approach, the data access logic is interwoven with the application logic, making it hard to reuse, keep track of, and to modify.
Ease-of-development: Here we're concerned with separation of concern, tools support, and skill requirements. With the SOA approach, application developers do not have to be concerned with the details of data management. The SOA approach typically provides a high-level tool the developer can use to generate information services for exposing the selected data. With the direct coupling approach, no tools support exists, and application developers have to deal directly with the details of data management. The developer in this case needs implementation-specific data management skills.
Performance and scalability: Here we're concerned with data transfer volume. Service orientation encourages the use of stored procedures, which reduce data transfer, and also the use of various optimizations (such as view caching, indexing, and so on). Application or information servers implementing SOA typically support scalability. However, response time due to greater path length may be an issue in high-performance applications.
Poor application logic may invoke multiple transactions with large data transfer with the direct coupling approach, though this approach generally avoids greater path lengths. All this depends on both the scalability of the database server and the middleware managing the connection pool.
- The Emerging Vision for Data Services: Becoming Information-Centric in an SOA World, Mark A. Beyer, David Newman, Daniel Sholler, Ted Friedman, Gartner Research, 24 April 2006: Enterprise architects and planners need a framework that guides data service deployment in a successful service-oriented and/or event-driven architecture. Gartner envisions data services guided by an enterprise information management strategy to maximize the value and accessibility of information.
- The RDF/XML Syntax Specification: This document defines an XML syntax for RDF called RDF/XML in terms of Namespaces in XML, the XML Information Set and XML Base.
- RDF Primer: This Primer provides the basic knowledge required to effectively use RDF. It introduces the basic concepts of RDF and describes its XML syntax. It describes how to define RDF vocabularies using the RDF Vocabulary Description Language, and gives an overview of some deployed RDF applications. It also describes the content and purpose of other RDF specification documents.
- developerWorks WebSphere SOA zone: Get technical resources, including articles, tutorials, downloads, and events for IBM WebSphere SOA solutions.
- developerWorks SOA and Web services zone: Get technical resources, including articles, tutorials, downloads, and events for SOA and Web services.