 | Level: Intermediate Martin Brown, Developer and writer, Consultant
30 Oct 2007 Using open source tools for developing grid applications opens up a wealth
of possibilities. The first is a very rapid development process, especially if you
take advantage of script languages like Perl or Python and deployment environments
like Apache. There is also a wealth of examples available that can
help you. Examine the advantages and disadvantages of developing a grid solution
using open source technology.
Open source components
The open source community includes and supports a huge range of tools and products that
can very easily be leveraged for use in your grid environments. Open source tools cover
the whole gamut of different software and technology, from operating systems, such as
Linux® and BSD, to full C/C++ development environments like GNU CC. You can
also gain access to easier-to-use scripting languages, such as Perl, as well as
components, tool kits and development environments like the Web-services tool kits from
Apache or the Eclipse Integrated Development Environment (IDE).
These components can be used individually or collectively to build a grid
solution, using native solutions or by taking advantage of the standardization of grid
solutions, including the WS-* group of Web services often used in grid deployment
environments, such as Globus.
The general advantage of using open source components is the ease of access,
development, and deployment. Because the software is available freely, you can use,
deploy, and try out different solutions until you discover the one that works for you.
Using open source tools can also sometimes speed up a task. This is because the open
source tools are collectively developed by a group of people, and those people can edit
and modify the original source code. This allows individuals to find a problematic or
time-consuming component and fix or modify the code to make it easier to use or more useful.
But perhaps most importantly, for the vast majority of open source solutions, you can
freely use, edit, and expand the original code to suit your needs and
requirements. If the tool you have chosen doesn't provide the information you want, you
can usually extend it in a way that most commercial tools simply don't support,
or you can modify the original source code to achieve what you want.
Most of the time, you can take advantage of the flexibility of the original
solution to achieve what you need. Let's start by looking at the functionality and
flexibility offered by the most common solution: open source scripting languages.
Using scripting languages
A key area in which you can take advantage of the open source model is in the area of
the scripting languages. Languages, such as Perl, PHP, Python, and Ruby, all provide a
wide range of functionality and enable you to quickly and easily develop an application
that can be used in a distributed environment.
With a scripting language, because you don't need to follow the normal
write/compile/link/execute sequence, the development of the application can often be
much quicker than with a traditional C/C++ application. Furthermore, all the scripting
languages provide libraries and extensions for connecting to different Web-services
technologies, such as SOAP and XML-RPC, and that means that you can very quickly develop
a solution that interfaces to your existing infrastructure or you can
create completely new Web services to support your grid service.
Web services are often a key part of most modern Web solutions. Web services provide a
number of advantages, such as ease of deployment, flexibility, and compatibility. For
example, you can develop a Web-services client using Perl using the script shown below.
Listing 1. Developing a Web-services client using Perl
#!/usr/bin/perl
use warnings;
use strict;
use SOAP::Lite;
print SOAP::Lite
-> uri('http://snode1:32080/')
-> proxy('http://snode1:32080/?session=store')
-> submit_image($image)
-> result;
print "\n";
|
In this example, taken from a grid solution written in Perl that implements a photo
storage system (see Resources for the "Building a Grid
with Perl" tutorial series), we use the SOAP::Lite module to
build a SOAP client. In this case, we are accessing the Web service on port 32080 of
the grid node snode1 and submitting an image into the system.
We can simplify the configuration of individual servers within the grid by using Perl's
internal hash type, as shown below.
Listing 2, Using Perl's internal hash type
$node = {
'name' => 'linux-grid-1',
'type' => 'snode',
'remote => 'snode1',
'gridid' => '43729810-09399528-817022102',
'grid' => 'image-grid-1',
'nodeid' => '57175600-10287415-53438102',
'distributorid' => '',
'distributor' => '',
'verification-id' => '',
};
|
When a photo is submitted into the grid, we use the information generated by the
individual nodes to track their current storage availability, choose a node, and
serialize the information (using the Perl Dumper) application into a structure that can
be stored into a database.
The distribution of information around the grid is handled by obtaining a list of
currently outstanding requests and processing the list of providers (grid nodes), then
using the SOAP interface to distribute the final work unit to the target node.
Searching the nodes for an existing item works almost in reverse: We submit the search
to each of the known nodes and collate the responses.
Listing 3 shows a different example, this time of a computational grid written entirely
within Python. For this solution, we make use not only of Web services but also of the
flexibility of the Python language as a scripting language.
For most computational grids, what you need to do is develop an application that can
execute your computation and return the results. Using Python, we can distribute a
Python script that will perform the calculation. Instead of being limited to the
functionality offered by a stand-alone application, we have the full flexibility to
process any Python script in a distributed fashion.
The core of the grid functionality is the Web service, this time using XML-RPC, which
is a lighter service to implement. You can see the main provider (grid node) Python
module below.
Listing 3. Computational grid written within Python
import time,sys
sys.path.append('..')
import DWGrid
from xmlrpclib import Server
gridspec = DWGrid.DWGrid()
provider = DWGrid.DWGridProvider(gridspec,'sulaco','192.168.0.101')
provider.register_component()
distributor = Server(gridspec.distributor)
def get_workunits():
global provider,gridspec
items = distributor.getworkunits(5)
counter = 0
for item in items:
gridspec.workunitqueue.add(item,'queue',item['_workunitid'])
counter = counter+1
return counter
def run_calculation(item):
workunit = provider.grid.workunitqueue.get(item,'queue')
distributor.log_status(provider.name,1)
gridspec.workunitqueue.move(workunit['_workunitid'],'queue','active')
modulename = workunit['calctype']
try:
exec 'import calculate_' + modulename + ' as calculate'
except:
(result,moduledata) = distributor.get_service_byname(modulename,'module')
if (result == 0):
modulefile = open('calculate_%s.py' % modulename,'w')
modulefile.write(moduledata)
modulefile.close()
try:
exec 'import calculate_' + modulename + ' as calculate'
except:
gridspec.workunitqueue.move(workunit['_workunitid'],'active','queue')
print "Error, module recovered from server does not load"
sys.exit(1)
else:
print "Error, module %s from server doesnt exist" % (modulename)
gridspec.workunitqueue.move(workunit['_workunitid'],'active','queue')
sys.exit(1)
distributor.log_event('log',provider.name,
'Starting Processing Workunit ID %s' % (item))
calcfunc = calculate.DWGridComputational()
result = calcfunc.execute(workunit['args'])
distributor.log_event('log',provider.name,'Finished Processing
Workunit ID %s' % (item))
workunit['result'] = result
gridspec.workunitqueue.move(workunit['_workunitid'],'active','complete')
gridspec.resultqueue.add(workunit,'queue',workunit['_workunitid'])
def put_results(items):
for resultid in items:
workunit = gridspec.resultqueue.get(resultid,'queue')
gridspec.resultqueue.move(resultid,'queue','active')
try:
distributor.putresult(workunit)
except:
gridspec.resultqueue.move(resultid,'active','queue')
return
gridspec.resultqueue.move(resultid,'active','complete')
while 1:
# first thing we need to do is check if there is anything
# in our queue that needs calculating
items = provider.grid.workunitqueue.list('queue')
if len(items) > 0:
run_calculation(items[0])
else:
count = get_workunits()
if count == 0:
distributor.log_status(provider.name,0)
time.sleep(60)
continue
items = provider.grid.resultqueue.list('queue')
if len(items) > 0:
put_results(items)
|
Although this looks daunting, you can see the core elements that execute the grid components:
-
get_workunits() contacts the distribution node and gets a
list of outstanding work units.
-
run_calculation() extracts the Python module required by
the calculation, saves the module into an external file, loads the file ready to
execute the embedded calculation method, then executes it, adding the calculation
results to the provider's internal queue.
-
put_results() takes the completed results and submits them
to the distribution node.
To demonstrate how speedy the development can be using this Python or other open source
solutions, the entire Python-based grid solution is slightly more than 300 lines of
code and is flexible enough to handle a variety of execution modules and results,
including registration, distribution, and data collection elements.
Open source libraries and environments
I've already described how the benefits of the open source tools allow for the editing
and modification of the original code to suit your needs and requirements, but often
this isn't necessary. The tools and functionality you need are often already available,
not because they have been built into the original solution but because the solutions
are often more flexible and extensible to begin with.
Before we look at some of the major environments on offer, the basic libraries and
extensions offered by many of the scripting languages should not be ignored. Python and
PHP come with a large range of standard modules and functionality that can be used out
of the box. There is also a huge range of third-party extensions. Perl has the
well-known Comprehensive Perl Archive Network (CPAN) library of modules.
There are several environments built on open source technology that combine the
functionality of many different components into a system that can be used to develop
and deploy different types of applications. Although not all of the open source environments described here are specifically
targeted at developing grid solutions, most of them can be adapted, modified, or used
as the base for developing grid-based solutions.
LAMP and derivatives
The original LAMP environment referred to the bundling of the Linux operating system,
Apache HTTP server, MySQL database, and PHP tool sets used to produce Web sites. Over
time, the generic LAMP has spawned a number of derivatives and diversifications. For
example, the "P" is often used to additionally to refer Perl- and Python-based
environments, or it can be replaced with "R" to produce LAMR for Ruby or Ruby on Rails
solutions, and "J" for Java™/JSP environments.
The "L" has been replaced with Windows® (WAMP) and in turn led to WIMP
(Windows and IIS), SAMP (Solaris), MAMP (Mac OS X), and even BAMP (BSD). The MySQL has
also been replaced with PostgreSQL (LAPP). Commercial environments and applications
have also been added, such as acronyms for WASP (Windows, Apache, SQL Server and PHP)
and OPAL (Oracle).
The LAMP bundle was designed for the development and deployment of Web sites and used
some of the best individual elements into a combination that many of them were already
being used for. Although not designed with the development of grids in
mind, the combination of these individual tools into a single bundle can make it very
easy to develop and deploy a grid-based solution.
Within a grid deployment, we can leverage each tool for handling different parts of a
typical grid solution:
- Apache is a powerful HTTP serving platform that provides a scalable environment for
serving up different applications and scripts, and can also be used to provide security
and tracking functionality.
- MySQL, PostgreSQL, and Apache Derby are all highly efficient relational database
solutions. Within a typical grid solution, they can be used to provide centralized
storage and resource provision.
- Perl, Python, PHP, Ruby, and others all provide a suitable scripting environment for
rapidly developing applications, and can call on extensive and powerful libraries
and extensions that provide specific functionality, such as Web services or security.
In addition to allowing for a very rapid development process, using a LAMP or
derivative solution allows for a easy and quick deployment process. Most Linux
distributions come with the LAMP tool kit standard. Building a
grid that uses the LAMP environment can, therefore, often be a straightforward case of
installing and setting up the individual Linux-based grid nodes and deploying your grid
application scripts to the nodes.
Apache's Java tool kits
Although Java itself is not strictly open source, a huge range of open source tool kits
and libraries have been built on top of Java technology to provide core
functionality, such as Web serving and Web services, and also to produce some grid
specific tool kits. Key among the different tool kits available are those from the
Apache Software Foundation.
Apache has produced a suite of tools we can use when building grid environments:
- Apache Tomcat is a Java-based Web-serving environment that can be used to share basic
files and Web-based solutions for Java applications, such as JSP and Java servlets.
Tomcat is often the basic serving environment used for many of the other tools in the
Apache Java tool kits, which are themselves based on the same JSP/servlet technology.
- Axis is an implementation of the SOAP Web-services solution, and Axis2 is a
redevelopment that not only supports SOAP but also other solutions, such as
Representational State Transfer (REST). The Axis2 tool kit includes a number of key
Web-services implementations, such as WS-Reliable Messaging, WS-Coordination,
WS-Security, and WS-Addressing, all of which are seen as key in the modern
Web services-based grid environment.
- Apache Muse is an implementation of the WS-ResourceFramework, WS-BaseNotification, and
the WS-DistributedManagement standards, also key elements of many Web services-based
grid environments. Both Muse and Axis/Axis2 rely on the Web-serving environment of
Tomcat to provide the core HTTP protocol.
By combining many of these services and by using the functionality offered by the
different Web services-based solutions, you can easily build a Web services-based grid
solution. See Resources for a list of tutorials, articles,
and solutions that have made the various Apache tool kits to build grid solutions.
All of the above solutions are open source, although they are strongly controlled and
guided by the Apache Software Foundation, a group of developers and architects that
ensure the individual projects keep their target and focus.
ActiveGrid
The ActiveGrid system is not, as the name suggests, a grid-specific development and
deployment environment. Nor is it truly open source. Instead, ActiveGrid is an
environment that makes use of the LAMP tool kits to develop applications that can make
use of grid-style technology to allow for the development of highly distributed business applications.
The system relies on two components: the ActiveGrid Application Builder for designing
and developing the application and the ActiveGrid Application Server that can be used
to deploy the application.
The focus is on developing full-blown applications that take advantage of the grid
environment, rather than providing smaller, scalable components you can use to develop
smaller, discrete applications.
Summary
We've looked at the advantages and disadvantages of developing a grid solution using
open source technology. The key items to take away from the examples and information in
this article should be the ease of use and speed of development. Through a combination
of the rapid development environments offered by scripting languages and the many open
source libraries and other solutions, it can be very quick and easy to produce a grid-based solution.
Both the Perl and the Python solutions detailed in this article are extracts from
larger series on developing grid solutions (see Resources).
Resources Learn
-
The five-part tutorial series "Building a grid with Perl" creates and entire grid
solution using Perl as the core scripting and development environment.
-
Grid and related technologies is a
good overview paper on grid computing from the Norwegian Computing Center.
-
To listen to interesting interviews and discussions for software developers, check out check out developerWorks podcasts.
-
Stay current with developerWorks' Technical events and webcasts.
-
Check out upcoming conferences, trade shows, webcasts, and other Events around the world that are of interest to IBM open source developers.
Get products and technologies
-
Download IBM product evaluation versions, and get your hands on application development tools and middleware products from DB2®, Lotus®, Rational®, Tivoli®, and WebSphere®.
-
Innovate your next development project with IBM trial software, available for download or on DVD.
Discuss
About the author  | 
|  | Martin Brown has been a professional writer for over seven years. He is the author of numerous books and articles across a range of topics. His expertise spans myriad development languages and platforms -- Perl, Python, Java™, JavaScript, Basic, Pascal, Modula-2, C, C++, Rebol, Gawk, Shellscript, Windows, Solaris, Linux®, BeOS, Mac OS/X and more -- as well as Web programming, systems management, and integration. Martin is a Subject Matter Expert (SME) for Microsoft® and regular contributor to ServerWatch.com, LinuxToday.com, and IBM developerWorks. He is also a regular blogger at Computerworld, The Apple Blog, and other sites. You can contact him through his Web site at: http://www.mcslp.com. |
Rate this page
|  |