Skip to main content

Services-based enterprise integration patterns made easy, Part 1: The evolution of basic concepts

Dr. Waseem Roshen (waroshen@us.ibm.com), IT Architect, IBM, Software Group
Dr. Waseem Roshen photo
Dr. Waseem Roshen is an IT architect in the Enterprise Architecture and Technology Center of Excellence of IBM Global Business Services in Columbus, Ohio. He works on enterprise architecture and integration. He's also a Sun Certified J2EE Architect, has published 60 articles, and has worked on 24 patents.

Summary:  This series of articles explains services-based enterprise integration patterns in an easy-to-understand, step-by-step way. In this installment, Part 1 of the series, you learn about the two earliest integration patterns—data sharing only and remote procedure call (RPC)—which help introduce the concepts of service provider and service consumer, platform independence, and connectivity. Exploring RPC helps you get familiar with the basic steps necessary for two applications to share functionality. This article also includes a general description of the concepts of loose coupling, code reuse, and layering and componentization. Part 2 of the series continues the discussion of the early patterns, while Part 3 and Part 4 cover the Service-Oriented Architecture (SOA)-based integration patterns, including examples.

View more content in this series

Date:  28 Feb 2008
Level:  Intermediate PDF:  A4 and Letter (58KB)Get Adobe® Reader®
Activity:  1437 views
Comments:  

Introduction

Making all the applications in an enterprise work in an integrated manner to provide unified and consistent data and functionality is a difficult task. It involves integrating various kinds of applications, such as custom-built applications (C++, Java™, or Java 2 Platform, Enterprise Edition [J2EE]), packaged applications (such as SAP CRM applications), and legacy applications (mainframe IBM® CICS® or IBM Information Management System [IMS™]). Furthermore, these applications might be dispersed geographically and run on varied platforms. There might be a need for integrating applications that are outside the enterprise.

As enterprises grow, and as the complexity involved in the enterprise integration has increased progressively with time, a number of integration patterns have also evolved over time. Thus, today a substantial number of integration patterns exist. These integration patterns vary from the simple file-based data transfer between applications to the complete SOA-based integration patterns.

Parts 1 and 2 of this series trace the evolution of these patterns with a view towards helping you understand all the basic concepts and features involved in SOA-based integration patterns. Some of the concepts and features covered include:

  • Consumer and provider of services
  • Loose coupling
  • Code reuse and layering
  • Language independence
  • Platform independence
  • Definition, publication, and discovery of services interface (that is, registry concepts)
  • Evolution of the enterprise service bus (ESB) from point-to-point integration involving the concepts of connectivity, marshaling, and mediation
  • Coarse granularity

This description of the evolution of integration patterns also emphasizes the important point that many of the preceding technologies and technical concepts have significantly contributed to the development of SOA-based integration patterns.

Another reason to discuss the older patterns is that even in today's world, a combination of older and newer patterns coexist. For example, many ESB implementations support a file-transfer mechanism for integration. In a similar manner, many application servers, such as IBM WebSphere®, have both the capabilities of an Object Request Broker (ORB) and asynchronous messaging.

It's interesting to note that recently an IBM proposal to standardize a Service Integration Maturity Model (SIMM) was accepted by the board of The Open Group. The description of earlier integration patterns in Parts 1 and 2 of this series can greatly help in determining and clarifying maturity levels of the application domain in SIMM.

Loose coupling

Of all the concepts, the major driving principle of the march towards SOA-type integration patterns is the idea of loose coupling. This drive for loose coupling has occurred because the number and kinds of applications being integrated has progressively grown very large. This requires integration patterns to minimize the effect on other applications due to changes made to one application. Another business reason for requiring loose coupling is that businesses require agility to meet today's changing business requirements. So the integration pattern must allow for this agility and be flexible. As you'll see in this article, while the path towards loose coupling and agility hasn't been a straight one, in general as we moved from older patterns toward SOA, the couplings between applications and software components have become weaker.

Maximizing code reuse

The second most important principle that's been employed in the development of a services-based architecture is the emphasis on maximizing code reuse. This code reuse results in more reliable and efficient code, because the same code is tested again and again. To implement code reuse, usually layering (also known as componentization) is required. Layering, or componentization, refers to pulling out different pieces of the code as separate software components so that at run time multiple applications can use the same code to, for example, make a network connection to remote applications. Layering also promotes loose coupling, because the internal workings of each layer can be changed without affecting other layers or applications.

Let's start the discussion about integration patterns with the simplest integration involving only data sharing between the applications. This will help ease you into the concept of connectivity between different applications.


Data sharing only

There are, in principle, three ways to share data between two applications:

Files

The first one, perhaps the most common, is through files. This is because storing data in files is the universal way to store data in a system. In this pattern of integration one application writes data to a file, while another application reads data from the same file. However, there are two major problems with this approach of sharing data between applications:

  • Data isn't shared in real time, as there's usually a lag (mainly dependent on business cycles) between writing and reading from the files.
  • There's tight coupling between the two applications sharing the data. Thus, changes (which modify the format or the content of the file) in the application producing the file must be accompanied by the changes in the application that consume the file.

Common database

The second pattern of sharing data is similar to the first and uses a common database. In this case, one application writes data to a database, while the other applications read from the database. Again, the data isn't shared in real time, as there's usually a lag between writing to the database by one application and reading of the data by other applications. This lag between writing and reading of the data is due to the fact that the reading application isn't aware of when the data is written by the writing application. Also, there's tight coupling between applications, because changes made to the database have a ripple effect, which makes the design of the shared database difficult.

Connectivity

To avoid the problem of stale data, you need real-time connection between applications sharing data. This is termed connectivity. The most rudimentary way to establish a connection between two applications is through sockets. Sockets let one application listen at a given port on a given machine for the incoming data, while another application can write to the same socket using the IP address and port address of the first application. The listening application can read the data as soon as the second application writes the data. So the data is shared in real time, and the problem of data staleness is eliminated.

Because the overhead associated with applications that communicate through sockets is very low, direct socket programming leads to efficient communication. It's also interesting that most of the modern methods of communication, such as message-oriented middleware (MOM) and messages under the hood, rely on socket programming. However, there are a number of shortcomings of the socket programming approach:

  • The major problem with socket programming is that only data can be shared directly, not the functionality.
  • The API for socket programming is rather low level and, therefore, difficult to use.
  • Because the API is low level, socket programming isn't suitable for dealing with complex data types.
  • The connectivity code is buried in the applications, so it can't be reused.
  • Socket programming is also not platform independent, because applications on both ends must explicitly account for the byte ordering differences (little endian versus big endian) on different platforms, such as mainframe and UNIX®.
  • There's also tight coupling between the two applications, because socket connection is point to point.

Thus, we need a solution that lets applications share data and functionality, and avoid low-level network programming. RPC was the first approach that allowed applications to share functionality, which is covered next.


Remote Procedure Call

The introduction of RPC was an important step, as it introduced some important concepts and features and specified the basic steps that are necessary for sharing functionality. Recall that services are basically about sharing functionality between applications and between components. Thus, it's instructive to discuss this integration pattern in some details.

RPC is also known as a client/server model and is a rung above socket programming. It eliminates the need for network programming. Briefly, RPC provides a function-oriented interface. The developer defines a function, much like those in functional languages like C, and generates code that makes the function look like a normal function to the caller. RPC is powerful enough to be the basis of client/server applications.

The first valuable concept that was introduced by RPC was that of a service provider, called a server, and a service consumer, called a client. In RPC, a server application provides a function that can be called in a normal manner by a consumer (client) application. The basic sequence is that a server (application) is started, which waits for the client to make a request. When the server receives the request from the client, the server executes the local function and returns the return value for the function to the client. The complete process is shown in Figure 1.


Figure 1. Remote Procedure Call complete process
Remote Procedure Call           complete process

One of the import components that's introduced in Figure 1 is the client stub. To the client, the client stub appears to be the actual procedure that it calls. The purpose of the stub is to package the arguments to the remote procedure, possibly put them into a standard format, and then build one or more network messages. This packaging of arguments is called marshaling. An important aspect of this marshaling is that the byte ordering differences between different platforms are handled automatically using a standard called external data representation (XDR), thus making RPC platform independent.

The second important component that RPC introduced is the RPC runtime library. The client stub uses the functions provided in the RPC runtime library to make a systems call into the local kernel to send the packaged message over the network to the server machine using a protocol, such as Transmission Control Protocol (TCP). In other words, RPC run time encapsulates all the systems calls necessary for the connectivity—that is, to send the packaged arguments over the network. Thus, the programmer doesn't need to know any systems programming.

On the server machine side, as the network message is received by the network routines in the kernel, it's sent to the server stub by the use of RPC run time. The server stub unmarshals the input parameters and invokes the requested local procedure in the server routines. After the local procedure is completed, the server stub marshals the return value into one or more network messages and sends the packaged return value to the server kernel through the use of RPC run time. The server kernel sends the message to the client machines using the network protocol, such as TCP. The client stub reads the network messages from the kernel through the use of RPC runtime routines. After possibly converting the return values, the client stub finally returns to the client function. This step appears to be a normal procedure return to the client.

RPC also introduced a rudimentary way of defining an interface between the client and the server through the use of a specification file. The RPC specification file may be considered the first step in the development of the services interface in today's world. An example of such a configuration file (square.x) is provided in Listing 1. This file is used to generate the skeleton code for both the server and the client using a tool like rpcgen. Note that the specification is specific to a language such as C and, therefore, requires both the server and the client to be written in the same language.


Listing 1. An example of RPC specification file, which defines a remote procedure for calculating the square of a number
struct square_in {		/* input (argument) */
	long 	arg1;
};
struct square_out {          	/* output (result) */
	long 	result;
 };
 program SQUARE_PROG {
   	version SQUARE_VERS {
		square_out SQUAREPROC (square_in) = 12;   	 /*procedure number*/
	} = 23;							/* version number */
 } = 0x31231234;						 /* program number */

To summarize what RPC is, it's important to note that RPC allowed for the first instance of real distributed computing by letting applications share functionality. In the process, a number of new concepts were introduced, including:

  • Service provider (server) and the service consumer (client).
  • Platform independence.
  • Concept of interface definition.
  • Marshaling of input and output parameters.
  • Encapsulation of the systems calls in a library that are necessary to communicate over the network.

However, RPC has a number of shortcomings, including:

  • The major disadvantage is that there's little room for code reuse. This is because the code for marshaling and unmarshaling, and the code for network communication is buried in the client and server applications.
  • RPC isn't language independent, and the client and the server must employ the same programming language.
  • There's also very tight coupling between the applications. The reason is that because the calls are synchronous, the client application must wait for the server to complete the procedure before it can proceed further. Also, it's possible to overcome this problem by using multithreading; however, it introduces another level of programming complexity that has inherent risks in terms of garbage collection.
  • The integration of the client and server is point to point and, therefore, isn't suitable when a number of applications need to be integrated.
  • RPC is also not suitable if a large number of remote calls are involved. This is because of the synchronous nature of the call, which doesn't allow the client to proceed before the server completes its work .

Conclusion

This article described two of the earliest integration patterns. The first one, socket programming, is suitable for sharing data in real time, while the second one, RPC, is suitable for sharing functionality. You learned the concepts of connectivity, service provider and service consumer, and platform independence.

To improve on RPC, two paths have been taken. The first is the method of distributed objects (also known as Object Request Broker), and the second method is that of asynchronous messaging. The distributed objects approach focuses on code reuse and language independence, while asynchronous messaging addresses the problem of tight coupling between applications. In Part 2 of this series, you'll first learn about the distributed objects approach, as it's more closely aligned with RPC. It's perhaps important to point out that today most of the application servers are based on ORB technology.


Resources

Learn

Get products and technologies

  • Innovate your next development project with IBM trial software, available for download or on DVD.

Discuss

About the author

Dr. Waseem Roshen photo

Dr. Waseem Roshen is an IT architect in the Enterprise Architecture and Technology Center of Excellence of IBM Global Business Services in Columbus, Ohio. He works on enterprise architecture and integration. He's also a Sun Certified J2EE Architect, has published 60 articles, and has worked on 24 patents.

Comments



Trademarks

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=SOA and Web services, Architecture
ArticleID=292238
ArticleTitle=Services-based enterprise integration patterns made easy, Part 1: The evolution of basic concepts
publish-date=02282008
author1-email=waroshen@us.ibm.com
author1-email-cc=flanders@us.ibm.com