Skip to main content

By clicking Submit, you agree to the developerWorks terms of use.

The first time you sign into developerWorks, a profile is created for you. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

All information submitted is secure.

  • Close [x]

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerworks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

By clicking Submit, you agree to the developerWorks terms of use.

All information submitted is secure.

  • Close [x]

SOA adventures, Part 3: How robust data layers accelerate SOA implementations

Mark Davydov (markdavydov@netscape.net), Consultant, Service-Oriented Architecture
Mark M. Davydov, Ph.D., is an internationally-known expert in software engineering and systems architecture, including SOA. Dr. Davydov is the author of numerous highly acclaimed articles in computer-related publications. His 2001 book Corporate Portals and e-Business Integration -- A Manager's Guide, McGraw-Hill Professional Publishing, introduced many ideas that influenced the progression of Service-Oriented Architecture and the Web services model.

Summary:  Learn how to drive complexity out of Service-Oriented Architecture (SOA). In Part 3 of this series on SOA adventures, Mark Davydov, an internationally-known expert in software engineering and systems architecture, takes a deep look at the Data Services Layer (DSL) and it's role in a SOA. He also addresses important design issues when planning for a DSL solution.

Date:  21 Oct 2005
Level:  Intermediate
Also available in:   Chinese

Activity:  4299 views
Comments:  

Overview

The concept of Service-Oriented Architecture (SOA) was introduced to make life simpler -- loosely coupled interfaces, tighter management and operational controls, encapsulated heterogeneity, and so on. But many SOA implementations are becoming more complex than the architectures they're replacing.

There are many factors that contribute to the complexity of SOA, and the first one that comes to mind is, "What do you do about the data?" Relational databases are data-oriented, not service-oriented. In contrast, Lightweight Directory Access Protocol (LDAP) repositories are service-oriented; only specific lookup APIs are provided. What about files of various kinds: XML, delimited text, binary, and so forth? SOA-enablement of all of those dispersed and heterogeneous data sources is not a trivial exercise. There are plenty of challenges here well beyond the task of creating a service that is encapsulating an existing enterprise information system (EIS) or a local database.

To achieve maximum benefits from your SOA, it's crucial to create a common data management approach that handles all aspects of integrating information exposed through different autonomous services in your enterprise. With such an approach, it becomes possible to actively drive simplicity and SOA project acceleration.

To address these issues, the focus of this part of the series is on the Data Services Layer (DSL). Specifically, you will look at how to drive complexity out of SOA by creating a DSL and you will look at what are the most important design issues to be addressed when planning for a DSL solution.


Introduction

Adopting a SOA is a large undertaking, and it demands significant change for organizations in terms of how they have designed and implemented different data integration solutions for many years. I've had the opportunity to work closely with many organizations as they transition into Web services and SOA, and their experiences highlight the importance of planning ahead for an efficient and robust data access strategy.

Typically, IT departments create a framework for data integration requirements based on the following three options, as depicted in Figure 1 below:

  1. Applications directly access a back-end data source through stored procedures.
  2. Applications directly access back-end applications (often, based on CICS®/COBOL technologies) that, in turn, manage accesses to different EISs or data sources.
  3. Applications connect to an integration component that handles data requests to a specific EIS or data source under its control.

Figure 1. Typical data integration options
Typical data integration options

Because reengineering of data access is not commonly seen as a basic requirement when moving to a SOA, the tendency is to continue using existing frameworks for data integration -- especially options two and three, as shown above in Figure 1 -- by frontending them with a Web service interface, as depicted in Figure 2 below.


Figure 2. Wrapping existing frontends with Web services
Wrapping existing frontends with Web services

Such an approach shields an application from details such as where the data resides, what are the data model and query mechanisms supported by the data source, how the data access processes (in particular, data integrity) are implemented, and what platforms the data source resides on. It also introduces challenges of another kind, including: constructing Simple Object Access Protocol (SOAP) for payload, mapping from XML to particular data structures (for example, relational) and vice versa, resolving how to code with a more familiar interface like Java™ DataBase Connectivity (JDBC) / Open Database Connectivity (ODBC) and process result sets rather than parse and process XML results, and much more.

Also, it is common to leave the responsibility for data access performance with database administrators. However, moving to SOA often means substantial growth in the demands placed on the database environment, such as increased access volume and ensuring data integrity threatened by updating data through different applications that have not been previously involved with a particular database. Such demands cannot be solved in the database environment alone.

In a nutshell, using an approach when the task of accessing a data source is assumed to be nothing more than just for a business service to be passed on to another service and for the business service to continue to handle all the responsibilities for data interpretation, it does not eliminate the fact that data access logic still consumes a large percentage of development resources. That, in turn, can play a significant role in the success or failure of a SOA initiative. Moreover, providing multiple Web service interfaces that even front-end data integration solutions can undermine, whether the resulting solutions will meet performance and scalability requirements. In other words, as I was stating in the beginning of this discussion, it is not uncommon to see SOA implementations become more complex than the architectures they're replacing.

As a result, the task of reengineering data access frameworks, when moving to SOA, must be properly managed. This is where DSL plays a large role.


What is DSL?

In the context of SOA, DSL is a software layer that delivers a production-quality runtime that manages delivery and persistence of data between business services and service requesting applications and the multiple data sources such services use, whether in an enterprise or in a cross-enterprise distributed computing environment. DSL provides an abstraction (and integration) of all the data sources of interest to business services. With DSL, business services see a "cloud of data", which can be represented using some common data model (preferably, a relational schema) and queried using some standard language, standard interface, or over a standard transport. Bottom line: the main purpose of a DSL in SOA is to provide a single point of access for all read and write operations, ensuring that data is always pulled from the secured single source, interpreted consistently, and correctly remains integrity-wise across all services. DSL acts as a common bridge between the Business Services layer and the Data Persistency layer (see Figure 3). As depicted in Figure 3, DSL is designed to not only act as a single point in accessing information that exists in multiple systems, but it also to provide a holistic view of data models embedded in such systems. The latter point is the most important in terms of succeeding with SOA.


Figure 3. Centralizing data access in SOA with DSL
Centralizing data access in SOA with DSL

Obviously, a DSL is not required in every case. In some cases, when the number of business services is low (the rule of thumb here is usually below 50), and most importantly, the number of large data sources is also relatively low (let's say not greater than 10), an organization might want to continue to use individual connectors to data sources directly or using Web service interfaces to these connectors. Although, when it is necessary to scale to large numbers of business services and data sources in the same way that the enterprise applications today scale to very large numbers of transactions (for example, hundreds or even thousands of transactions per second), using individually targeted, "stove-piped" techniques is no longer possible.

Some companies have tried to sort out these issues by differentiating the action on the data (for example, viewing and transacting and / or updating) and the object of the action (for example, transactional services, informational services, or other higher order concepts). Such approaches require the creation of new data sources that consolidate and normalize data for viewing versus updating, and that in turn, creates additional difficulties of ensuring data consistency (data synchronization) across many software layers.

More and more companies have come to realize that addressing the above data access issues and limitations in the pursuit of a robust and comprehensive DSL solution becomes mandatory to render cost-effective data services for large-scale commercial distributed computing environments.


How to drive complexity out of SOA by creating a DSL?

It is increasingly evident that, in order to drive complexity out of SOA, data access to enterprise data sources should be virtualized, wherein business services have to be isolated from many technology aspects that are common when using Web services as a primary mean for data integration, including the following:

  • Data source connectivity.
  • Data source security.
  • Data mapping and resolution of taxonomy and semantic differences between data sources, service providers, and underlying data source models.
  • Database platform differences.
  • Non-database structure differences in cases of using files and directories as data sources.
  • Handling all of the details of providing data to the objects in business services for both single instances and list requests, and for performing all standard data access operations such as create, read, search, and delete operations.
  • Ensuring transaction integrity when distributing business processes across the network and when failure and reliability concerns are significant, for example, in banking applications.
  • Exception handling and reporting.

By building a DSL that encapsulates all the cited above capabilities, you are basically centralizing all data services code, resulting in a highly adaptable and maintainable SOA-based solution. This considerably simplifies the implementation issues involved with SOA, because any code for providing data to business services you need to make available within the code of the services are all done once, no matter how many applications use it.

Therefore, instead of worming data-related service invocation code throughout each business service in every one of SOA-targeted applications, you can centralize the data access into a single uniform type of data access to all applications so you can reduce the number of Web service interactions and the number of open database connections at any one time. All of these points help create an environment that fosters high scalability of concurrent Web service requests that is very useful to SOA-based applications, because increasing the number of concurrent Web service requests through a DSL, you can increase the number of concurrent users and allow more applications to use the required data sources.


Architectural options

Figure 4 depicts a conceptual architecture of DSL. With this architecture, DSL gets invoked for every data access operation or transaction initiated by a business service. Moreover, instead of invoking a specialized (or dedicated) service interface explicitly, business services delegate the data access handling to another system process that is a composite of multiple services. It is the responsibility of this system process to facilitate integration to any data source under the DSL control for all business services.


Figure 4. Conceptual architecture of DSL

So, conceptually, a DSL solution includes two logical tiers: a data services orchestration tier that supports the referred to system process and a data integration tier. Decomposing a DSL solution into smaller building blocks (separate tiers) presents some interesting architectural options.

Let's briefly discuss each tier, starting with the data integration tier. Generally, this tier can be implemented in two ways, as follows:

Option 1

The first option is consolidating data into a single physical store, often referred to as Operational Data Store (ODS). Consolidating data creates a possibility for achieving fast, highly available, and integrated access to related information that simplifies performance, data consistency, and availability requirements. Also, this can enable sophisticated data transformation for semantic normalization. However, as I have mentioned before, consolidated data stores are expensive because of additional administration, server and storage resource requirements, and most importantly, data synchronization between the copy and the source of data can be a real challenge when true real-time accesses are involved. In addition, much of unstructured data such as documents, images, or audio could not be easily addressed within this scenario. Also, from the SOA standpoint, any data integration solution that is based on some form of data replication should be avoided as much as possible, because it creates tight coupling between multiple data models -- DSL, ODS, and the legacy data sources. Such dependency might not be flexible for accommodating future needs and it might lead to reengineering of the DSL whenever the legacy data sources are upgraded, rewritten, or replaced.

Option 2

The second option is all about enabling a federated data access capability as though it were a single resource, where distributed access to data is totally transparent regardless of a variety of data sources and platforms. A federated approach to distributed data access can provide the required levels of data synchronization without an ODS. However, this approach is not without shortcomings, especially in terms of performance -- performance of many requests will be slower than with Option 1. Therefore, additional functions need to be considered here that compensate for performance degradation, in particular:

  • Enable data caching to reduce the number of accesses in order to handle large volumes of data requests.
  • Enable data compression when transferring data over the network, including XML payloads.
  • Use binary representations of XML, in particular, the so-called XML-binary Optimized Packaging, an emerging concept for binary encoding of XML documents (an example of which is XML Infoset) and a specialized format for publishing XML documents for consumption by Web services and implemented in Axis 2. Based on the encoding mechanism used, it can provide binary encoding (see Resources).

Now, about the data services orchestration tier: Although "smarts" exist both here and at the data integration tier, with the heavy lifting happening at the data integration tier, the data services orchestration tier provides the necessary functions that specifically focus on facilitating SOA implementations -- especially on providing a common object model and its mapping to the relational schema.


What are the most important design issues to be addressed by DSL?

Several design considerations need addressing when planning a DSL. Clearly, it is not possible to cover all the important considerations in one article, but let's go through the most important ones:

First off, as it is always important for any SOA implementation -- defining a proper data services model, in particular, an inventory of data services with built-in reusability characteristics and an associated object model.

Now, about a common object model. This element of a DSL implementation is on of the most critical design elements. A common object model normalizes (most importantly, in terms of taxonomy and semantic differences) multiple domain objects. It is very important to remember, as DSL is concerned, objects in the model are a logical representation of data functions (creating, updating, retrieving, and so forth), not of the actual data physically stored in a database. As a result, the common object model includes the following types of object classes:

  • Business objects -- objects that present a single view of enterprise entities and a unified view of all common entities spread across multiple data sources, for example, a unified view of a Customer entity, Item entity, Warehouse entity, and so forth.
  • Function objects -- objects that represent business process-based relationships between business objects, such as entities that represent business relationships of objects within the context of particular business processes, for example, Order entity, Inventory entity, Account entity, Payments entity, and so forth; those objects describe the static and dynamic relationships that exist between business objects.
  • Data access objects -- objects that encapsulate common data access methods against business and functional objects, for example, common sets of read and write methods such as GetCustomerById(), GetCustomerByName(), GetCustomers(), CreateCustomerById(), and so forth.; also include complex navigational operations like aggregation, joins, and nesting.
  • Feature objects -- objects that encapsulate complex processing logic like object-relational or O/R mapping, transformations, security, and caching; typically, feature objects that are dedicated to transformations have a private "get all instances" method that encapsulates related logic.
  • Cross-reference or correlation objects -- objects that represent references that individual services "hold on" a business entity; in many cases, apart from defining a "holistic" view, DSL should be able to correlate an instance of an enterprise entity used by one service to an instance in another service. A cross-reference object encapsulates such correlation logic, for example, by matching entity instances from various services using an expression that maps an instance in one system with an instance in another system (often, this technique has been referred to as Matching Predicate); explicitly handling references in DSL becomes very important in order to be able performing updates on a unified entity.

Secondly, as mentioned before, in order to maximize DSL performance, especially when using the federated approach for data integration, it is necessary to institute caching at multiple tiers of the solution.

At the data services orchestration tier, caching is supported by a specialized data caching service. It is designed to cache basic information about business or function objects that will be consumed in many method calls of Web services, and that will change infrequently. Cached information is mostly placed into feature objects that are being persisted in memory.

An XML query language (XQuery) is very useful for implementing caching with feature objects. XQuery is an emerging W3C standard for querying native XML data. Why is it useful for the task of caching? It is so, primarily, because it allows modeling a data source as a process and / or function (see example in Listing 1).


Listing 1. XQuery example for modeling data sources as a process and / or function

			

 for $p in strcache:getStoresInfo ()

      where  $p/stores/region eq "A"

       return 

               <asiaRegion>

                   {$p/stores}

                       {

                         for $s in $p/sales/sale

                         let $saletot := $s/Quaterlysales * $s/DollarConversion

                         where $s/YearsOperation gt 1

                         return

                               <store>

                                  <strno>{data($s/@storeno)}</strno>

                                  <type>{data($s/@storetype)}</type>

                                  <sales>{$saletot}</sales>

                              </store>

                       }

               </asiaRegion>



		

As you can see from the example in Listing 1, XQuery performs multiple tasks: it models the getStoresInfo() method as a function, provides for variable binding and filtering using $p/stores, and performs data transformation using $s/@storeno and $s/@storetype. In addition, XQuery allows the same functionality for both relational and non-relational data.

But XQuery is good primarily for read-type services. What about updates? In order to overcome shortcomings of XQuery for updates, two options are available: programmatically using Java or a very interesting new technology introduced by IBM and BEA -- Service Data Objects (SDO) (see Resources). An SDO does not only provide setattribute() operation for performing direct updates, but it also can track the changes if specified in its XML configuration using <ChangeSummary> -- </ChangeSummary> block.

Selecting proper Web services-oriented tooling and run-time platforms for data services is extremely important in terms of achieving needed performance characteristics. For example, since release 5.1, IBM WebSphere® Application Server provides a highly optimized cache capability that you can use on top of application-level caching I discussed before. This capability is called the WebSphere Dynamic Cache Service. For data services, performance of many types of requests can be greatly enhanced by using the dynamic cache.

The cache mechanism gets activated by adding a cachefile.xml file into the WEB-INF directory of the Web project where the Web services are deployed. It is important to pinpoint here that the dynamic cache attempts to match each of the different cache-entry elements by analyzing configuration information for that object. Different cacheable objects have different <class> elements. In order to cache objects of the DSL, you need to define a cache policy using the <name> element to identify uniquely objects you want to cache. The cachespec.xml file also allows adding logic to invalidate the cache. Listing 2 depicts a common implementation for DSL-type objects.


Listing 2. Sample cachefile.xml for DSL object caching

			

<?xml version="1.0" encoding="UTF-8"?>

<!DOCTYPE cache SYSTEM "cachespec.dtd" >

<cache>

<cache-entry>

<class>webservice</class>

<name>/dataservices/MyServiceHTTPServicePort</name>

<cache-id>

<component id="xxxx" type="serviceOperation">

<value>http://nnnnn.nnnnnn.com/CommonQuery:getyyyyyy</value>

<required>true</required>

</component>

<component id="storeSales" type="serviceOperationParameter" />

<timeout>15</timeout>

</cache-id>

</cache-entry>

</cache>

		

Finishing the discussion about DSL performance, it is absolutely critical to focus on designing highly coarse-grained services that accept all necessary parameters and information for performing multiple related operations, thereby allowing DSL to accomplish as much as possible on behalf of the DSL-supported application. Therefore, implementing a comprehensive document-centric strategy for message exchange is required. For example, in case of a customer-oriented banking application, it is a good idea to allow the application to provide customer IDs, names, credit or debit card information, account numbers, and primary or e-mail address information -- all within a single request. The request itself might initiate multiple atomic transactions (customer authentication, credit card authorization, submission of changes to customer profiles, and so on).


Conclusion

In this article, I have touched upon several points that demonstrated compelling reasons for a shared DSL in order to simplify and improve the overall SOA architecture. I have also described a few techniques for ensuring that such implementation can maintain stringent levels of request volumes in a federated data integration scenario. These techniques can be easily expanded using additional enhancements and technologies tailored for your particular situation. For instance, you might want to consider implementing an ESB as a frontend to your DSL. Each of your applications would invoke the ESB, which would in turn call the appropriate data services of the DSL. This would reduce the amount of service interaction functionality in your production DSL, thereby improving administration and operational characteristics.


Resources

Learn

Get products and technologies

  • Get your hands on application development tools and middleware products from DB2®, Lotus®, Rational®, Tivoli®, and WebSphere. You can download evaluation versions of the products at no charge, or select the Linux® or Windows® version of developerWorks' Software Evaluation Kit.

Discuss

About the author

Mark Davydov

Mark M. Davydov, Ph.D., is an internationally-known expert in software engineering and systems architecture, including SOA. Dr. Davydov is the author of numerous highly acclaimed articles in computer-related publications. His 2001 book Corporate Portals and e-Business Integration -- A Manager's Guide, McGraw-Hill Professional Publishing, introduced many ideas that influenced the progression of Service-Oriented Architecture and the Web services model.

Report abuse help

Report abuse

Thank you. This entry has been flagged for moderator attention.


Report abuse help

Report abuse

Report abuse submission failed. Please try again later.


developerWorks: Sign in


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

Choose your display name

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


Rate this article

Comments

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=SOA and Web services, XML
ArticleID=96729
ArticleTitle=SOA adventures, Part 3: How robust data layers accelerate SOA implementations
publish-date=10212005
author1-email=markdavydov@netscape.net
author1-email-cc=flanders@us.ibm.com

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

For articles in technology zones (such as Java technology, Linux, Open source, XML), Popular tags shows the top tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), Popular tags shows the top tags for just that product zone.

For articles in technology zones (such as Java technology, Linux, Open source, XML), My tags shows your tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), My tags shows your tags for just that product zone.

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Try IBM PureSystems. No charge.

Special offers