 | Level: Intermediate Dr. Waseem Roshen (waroshen@us.ibm.com), IT Architect, IBM
28 Feb 2008 This series of articles explains services-based enterprise
integration patterns in an easy-to-understand, step-by-step way. In this
installment, Part 1 of the series, you learn about the two earliest integration
patterns—data sharing only and remote procedure call (RPC)—which
help introduce the concepts of service provider and service consumer, platform
independence, and connectivity. Exploring RPC helps you get familiar with the basic
steps necessary for two applications to share functionality. This article also
includes a general description of the concepts of loose coupling, code reuse, and
layering and componentization. Part 2 of the series continues the discussion of
the early patterns, while Part 3 and Part 4 cover the Service-Oriented Architecture
(SOA)-based integration patterns, including examples.
Introduction
Making all the applications in an enterprise work in an integrated manner to
provide unified and consistent data and functionality is a difficult task. It
involves integrating various kinds of applications, such as custom-built
applications (C++, Java™, or Java 2 Platform, Enterprise Edition [J2EE]),
packaged applications (such as SAP CRM applications), and legacy applications
(mainframe IBM® CICS® or IBM Information Management System
[IMS™]). Furthermore, these applications might be dispersed
geographically and run on varied platforms. There might be a need for
integrating applications that are outside the enterprise.
As enterprises grow, and as the complexity involved in the enterprise integration
has increased progressively with time, a number of integration patterns have also
evolved over time. Thus, today a substantial number of integration patterns exist.
These integration patterns vary from the simple file-based data transfer between
applications to the complete SOA-based integration
patterns.
Parts 1 and 2 of this series trace the evolution of
these patterns with a view towards helping you understand all the basic
concepts and features involved in SOA-based integration patterns. Some of the
concepts and features covered include:
- Consumer and provider of services
- Loose coupling
- Code reuse and layering
- Language independence
- Platform independence
- Definition, publication, and discovery of services interface (that is, registry
concepts)
- Evolution of the enterprise service bus (ESB) from point-to-point integration
involving the concepts of connectivity, marshaling, and mediation
- Coarse granularity
This description of the evolution of integration patterns also emphasizes the
important point that many of the preceding technologies and technical concepts
have significantly contributed to the development of SOA-based integration
patterns.
Another reason to discuss the older patterns is that even in today's world, a
combination of older and newer patterns coexist. For example, many ESB
implementations support a file-transfer mechanism for
integration. In a similar manner, many application servers, such as IBM
WebSphere®,
have both the capabilities of an Object Request Broker (ORB) and asynchronous
messaging.
It's interesting to note that recently an IBM proposal to standardize a Service
Integration Maturity Model (SIMM) was accepted by the board of The Open
Group. The description of earlier integration patterns in Parts 1 and 2 of
this series can greatly help in determining and clarifying maturity levels of the
application domain in SIMM.
Loose coupling
Of all the concepts, the major driving principle of the march towards SOA-type
integration patterns is the idea of loose coupling. This drive for loose coupling
has occurred because the number and kinds of applications being integrated has
progressively grown very large. This requires integration patterns to minimize
the effect on other applications due to changes made to one application. Another
business reason for requiring loose coupling is that businesses require agility to
meet today's changing business requirements. So the integration pattern must
allow for this agility and be flexible. As you'll see in this article, while
the path towards loose coupling and agility hasn't been a straight one, in
general as we moved from older patterns toward SOA, the couplings between
applications and software components have become weaker.
Maximizing code reuse
The second most important principle that's been employed in the development of
a services-based architecture is the emphasis on maximizing code reuse. This code
reuse results in more reliable and efficient code, because the same code is tested
again and again. To implement code reuse, usually layering (also known as
componentization) is required. Layering, or componentization, refers
to pulling
out different pieces of the code as separate software components so that at run
time multiple applications can use the same code to, for example, make a network
connection to remote applications. Layering also promotes loose coupling, because the
internal workings of each layer can be changed without affecting other layers or
applications.
Let's start the discussion about integration patterns with the simplest
integration involving only data sharing between the applications. This will help
ease you into the concept of connectivity between different applications.
Data sharing only
There are, in principle, three ways to share data between two applications:
Files
The first one, perhaps the most common, is through files. This is
because storing data in files is the universal way to store data in a
system. In this pattern of integration one application writes data to a file, while
another application reads data from the same file. However, there are two major
problems with this approach of sharing data between applications:
-
Data isn't shared in real time, as there's usually a lag
(mainly dependent on business cycles) between writing and reading from the files.
-
There's tight coupling between the two applications sharing the data.
Thus, changes (which modify the format or the content of the file) in the
application producing the file must be accompanied by the changes in the
application that consume the file.
Common database
The second pattern of sharing data is similar to the first and uses a common
database. In this case, one application writes data to a database, while the other
applications read from the database. Again, the data isn't shared in real
time, as there's usually a lag between writing to the database by one application
and reading of the data by other applications. This lag between writing and
reading of the data is due to the fact that the reading application isn't aware
of when the data is written by the writing application. Also, there's tight
coupling between applications, because changes made to the database have a ripple
effect, which makes the design of the shared database difficult.
Connectivity
To avoid the problem of stale data, you need real-time connection
between applications sharing data. This is termed connectivity. The
most rudimentary way to establish a connection between two applications is through
sockets. Sockets let one application listen at a given port on a given
machine for the incoming data, while another application can write to the same
socket using the IP address and port address of the first application. The
listening application can read the data as soon as the second application writes
the data. So the data is shared in real time, and the problem of data staleness
is eliminated.
Because the overhead associated with applications that communicate through
sockets is very low, direct socket programming leads to efficient
communication. It's also interesting that most of the modern methods
of communication, such as message-oriented middleware (MOM) and messages under the hood, rely on socket programming.
However, there are a number of shortcomings of the socket programming approach:
- The major problem with socket programming is that only data can be shared
directly, not the functionality.
- The API for socket programming is rather low level and, therefore, difficult
to use.
- Because the API is low level, socket programming isn't suitable for dealing
with complex data types.
- The connectivity code is buried in the applications, so it can't be reused.
- Socket programming is also not platform independent, because applications on
both ends must explicitly account for the byte ordering differences (little
endian versus big endian) on different platforms, such as mainframe and
UNIX®.
- There's also tight coupling between the two applications, because socket
connection is point to point.
Thus, we need a solution that lets applications share data and
functionality, and avoid low-level network programming. RPC was the first approach
that allowed applications to share functionality, which is covered next.
Remote Procedure Call
The introduction of RPC was an important step, as it
introduced some important concepts and features and specified the basic steps that
are necessary for sharing functionality. Recall that services are basically about
sharing functionality between applications and between components. Thus, it's instructive to
discuss this integration pattern in some details.
RPC is also known as a client/server model and is a rung above socket programming. It
eliminates the need for network programming. Briefly, RPC provides a
function-oriented interface. The developer defines a function, much like those in
functional languages like C, and generates code that makes the function look like
a normal function to the caller. RPC is powerful enough to be the basis of
client/server applications.
The first valuable concept that was introduced by RPC was that of a service
provider, called a server, and a service consumer, called a client. In RPC, a
server application provides a function that can be called in a normal manner by a
consumer (client) application. The basic sequence is that a server (application)
is started, which waits for the client to make a request. When the server receives
the request from the client, the server executes the local function and returns
the return value for the function to the client. The complete process is shown in
Figure 1.
Figure 1. Remote Procedure Call
complete process
One of the import components that's introduced in Figure 1 is the client stub.
To the client, the client stub appears to be the actual procedure that it calls.
The purpose of the stub is to package the arguments to the remote procedure,
possibly put them into a standard format, and then build one or more network
messages. This packaging of arguments is called marshaling. An important aspect
of this marshaling is that the byte ordering differences between different
platforms are handled automatically using a standard called external data
representation (XDR), thus making RPC platform independent.
The second important component that RPC introduced is the RPC runtime library.
The client stub uses the functions provided in the RPC runtime library to make a
systems call into the local kernel to send the packaged message over the
network to the server machine using a protocol, such as Transmission Control
Protocol (TCP). In other words, RPC
run time encapsulates all the systems calls necessary for the
connectivity—that is, to send the packaged arguments over the
network. Thus, the
programmer doesn't need to know any systems programming.
On the server machine side, as the network message is received by the network
routines in the kernel, it's sent to the server stub by the use of RPC run time.
The server stub unmarshals the input parameters and invokes the requested local
procedure in the server routines. After the local procedure is completed, the
server stub marshals the return value into one or more network messages and sends
the packaged return value to the server kernel through the use of RPC run time. The
server kernel sends the message to the client machines using the network
protocol, such as TCP. The client stub reads the network messages from the kernel
through the use of RPC runtime routines. After possibly converting the return
values, the client stub finally returns to the client function. This step appears
to be a normal procedure return to the client.
RPC also introduced a rudimentary way of defining an interface between the client
and the server through the use of a specification file. The RPC specification file
may be considered the first step in the development of the services interface in
today's world. An example of such a configuration file (square.x) is provided in
Listing 1. This file is used to generate the skeleton code for both the server and
the client using a tool like rpcgen. Note that the specification is specific to
a language such as C and, therefore, requires both the server and the client to be
written in the same language.
Listing 1. An example of RPC specification file, which defines a remote procedure for calculating the square of a number
struct square_in { /* input (argument) */
long arg1;
};
struct square_out { /* output (result) */
long result;
};
program SQUARE_PROG {
version SQUARE_VERS {
square_out SQUAREPROC (square_in) = 12; /*procedure number*/
} = 23; /* version number */
} = 0x31231234; /* program number */
|
To summarize what RPC is, it's important to note that RPC allowed for the first
instance of real distributed computing by letting applications share
functionality. In the process, a number of new concepts were introduced,
including:
- Service provider (server) and the service consumer (client).
- Platform
independence.
- Concept of interface definition.
- Marshaling of input and output
parameters.
- Encapsulation of the systems calls in a library that are necessary
to communicate over the network.
However, RPC has a number of shortcomings, including:
- The major disadvantage is that there's little room for code reuse. This is
because the code for marshaling and unmarshaling, and the code for network
communication is buried in the client and server applications.
- RPC isn't language independent, and the client and the server must employ the
same programming language.
- There's also very tight coupling between the applications. The reason is that
because the calls are synchronous, the client application must wait for the
server to complete the procedure before it can proceed further. Also, it's
possible to overcome this problem by using multithreading; however,
it introduces another level of programming complexity that has inherent risks
in terms of garbage collection.
- The integration of the client and server is point to point and, therefore,
isn't suitable when a number of applications need to be integrated.
- RPC is also not suitable if a large number of remote calls are involved.
This is because of the synchronous nature of the call, which doesn't allow the
client to proceed before the server completes its work .
 |
Conclusion
This article described two of the earliest integration patterns. The first one, socket
programming, is suitable for sharing data in real time, while the second one, RPC,
is suitable for sharing functionality. You learned the concepts
of connectivity, service provider and service consumer, and platform independence.
To improve on RPC, two paths have been taken. The first is the method of
distributed objects (also known as Object Request Broker), and the second method is
that of asynchronous messaging. The distributed objects approach focuses on code
reuse and language independence, while asynchronous messaging addresses the problem
of tight coupling between applications. In Part 2 of this series, you'll first
learn about the distributed
objects approach, as it's more closely aligned with RPC.
It's perhaps important to point out that today most of the application servers
are based on ORB technology.
Resources Learn
Get products and technologies
- Innovate your next development project with
IBM trial software, available for download or on DVD.
Discuss
About the author  | 
|  | Dr. Waseem Roshen is an IT architect in the Enterprise Architecture and Technology Center of Excellence of IBM Global Business Services in Columbus, Ohio. He works on enterprise architecture and integration. He's also a Sun Certified J2EE Architect, has published 60 articles, and has worked on 24 patents. |
Rate this page
|  |