Skip to main content

skip to main content

developerWorks  >  Open source | Grid computing  >

Developing a grid application with open source tools

developerWorks
Document options

Document options requiring JavaScript are not displayed


Rate this page

Help us improve this content


Level: Intermediate

Martin Brown, Developer and writer, Consultant

30 Oct 2007

Using open source tools for developing grid applications opens up a wealth of possibilities. The first is a very rapid development process, especially if you take advantage of script languages like Perl or Python and deployment environments like Apache. There is also a wealth of examples available that can help you. Examine the advantages and disadvantages of developing a grid solution using open source technology.

Open source components

The open source community includes and supports a huge range of tools and products that can very easily be leveraged for use in your grid environments. Open source tools cover the whole gamut of different software and technology, from operating systems, such as Linux® and BSD, to full C/C++ development environments like GNU CC. You can also gain access to easier-to-use scripting languages, such as Perl, as well as components, tool kits and development environments like the Web-services tool kits from Apache or the Eclipse Integrated Development Environment (IDE).

These components can be used individually or collectively to build a grid solution, using native solutions or by taking advantage of the standardization of grid solutions, including the WS-* group of Web services often used in grid deployment environments, such as Globus.

The general advantage of using open source components is the ease of access, development, and deployment. Because the software is available freely, you can use, deploy, and try out different solutions until you discover the one that works for you.

Using open source tools can also sometimes speed up a task. This is because the open source tools are collectively developed by a group of people, and those people can edit and modify the original source code. This allows individuals to find a problematic or time-consuming component and fix or modify the code to make it easier to use or more useful.

But perhaps most importantly, for the vast majority of open source solutions, you can freely use, edit, and expand the original code to suit your needs and requirements. If the tool you have chosen doesn't provide the information you want, you can usually extend it in a way that most commercial tools simply don't support, or you can modify the original source code to achieve what you want.

Most of the time, you can take advantage of the flexibility of the original solution to achieve what you need. Let's start by looking at the functionality and flexibility offered by the most common solution: open source scripting languages.



Back to top


Using scripting languages

A key area in which you can take advantage of the open source model is in the area of the scripting languages. Languages, such as Perl, PHP, Python, and Ruby, all provide a wide range of functionality and enable you to quickly and easily develop an application that can be used in a distributed environment.

With a scripting language, because you don't need to follow the normal write/compile/link/execute sequence, the development of the application can often be much quicker than with a traditional C/C++ application. Furthermore, all the scripting languages provide libraries and extensions for connecting to different Web-services technologies, such as SOAP and XML-RPC, and that means that you can very quickly develop a solution that interfaces to your existing infrastructure or you can create completely new Web services to support your grid service.

Web services are often a key part of most modern Web solutions. Web services provide a number of advantages, such as ease of deployment, flexibility, and compatibility. For example, you can develop a Web-services client using Perl using the script shown below.


Listing 1. Developing a Web-services client using Perl
                
#!/usr/bin/perl

use warnings;
use strict;

use SOAP::Lite;

print SOAP::Lite
  -> uri('http://snode1:32080/')
  -> proxy('http://snode1:32080/?session=store')
  -> submit_image($image)
  -> result;

print "\n";

In this example, taken from a grid solution written in Perl that implements a photo storage system (see Resources for the "Building a Grid with Perl" tutorial series), we use the SOAP::Lite module to build a SOAP client. In this case, we are accessing the Web service on port 32080 of the grid node snode1 and submitting an image into the system.

We can simplify the configuration of individual servers within the grid by using Perl's internal hash type, as shown below.


Listing 2, Using Perl's internal hash type
                
$node = {
    'name' => 'linux-grid-1',
    'type' => 'snode',
    'remote => 'snode1',
    'gridid' => '43729810-09399528-817022102',
    'grid' => 'image-grid-1',
    'nodeid' => '57175600-10287415-53438102',
    'distributorid' => '',
    'distributor' => '',
    'verification-id' => '',
};

When a photo is submitted into the grid, we use the information generated by the individual nodes to track their current storage availability, choose a node, and serialize the information (using the Perl Dumper) application into a structure that can be stored into a database.

The distribution of information around the grid is handled by obtaining a list of currently outstanding requests and processing the list of providers (grid nodes), then using the SOAP interface to distribute the final work unit to the target node. Searching the nodes for an existing item works almost in reverse: We submit the search to each of the known nodes and collate the responses.

Listing 3 shows a different example, this time of a computational grid written entirely within Python. For this solution, we make use not only of Web services but also of the flexibility of the Python language as a scripting language.

For most computational grids, what you need to do is develop an application that can execute your computation and return the results. Using Python, we can distribute a Python script that will perform the calculation. Instead of being limited to the functionality offered by a stand-alone application, we have the full flexibility to process any Python script in a distributed fashion.

The core of the grid functionality is the Web service, this time using XML-RPC, which is a lighter service to implement. You can see the main provider (grid node) Python module below.


Listing 3. Computational grid written within Python
                
import time,sys
sys.path.append('..')
import DWGrid
from xmlrpclib import Server

gridspec = DWGrid.DWGrid()
provider = DWGrid.DWGridProvider(gridspec,'sulaco','192.168.0.101')
provider.register_component()

distributor = Server(gridspec.distributor)

def get_workunits():
    global provider,gridspec
    items = distributor.getworkunits(5)
    counter = 0
    for item in items:
        gridspec.workunitqueue.add(item,'queue',item['_workunitid'])
        counter = counter+1
    return counter

def run_calculation(item):
    workunit = provider.grid.workunitqueue.get(item,'queue')
    distributor.log_status(provider.name,1)
    gridspec.workunitqueue.move(workunit['_workunitid'],'queue','active')
    modulename = workunit['calctype']
    try:
        exec 'import calculate_' + modulename + ' as calculate'
    except:
        (result,moduledata) = distributor.get_service_byname(modulename,'module')
        if (result == 0):
            modulefile = open('calculate_%s.py' % modulename,'w')
            modulefile.write(moduledata)
            modulefile.close()
            try: 
                exec 'import calculate_' + modulename + ' as calculate'
            except:
                gridspec.workunitqueue.move(workunit['_workunitid'],'active','queue')
                print "Error, module recovered from server does not load"
                sys.exit(1)
        else:
            print "Error, module %s from server doesnt exist" % (modulename)
            gridspec.workunitqueue.move(workunit['_workunitid'],'active','queue')
            sys.exit(1)
    distributor.log_event('log',provider.name,
                         'Starting Processing Workunit ID %s' % (item))
    calcfunc = calculate.DWGridComputational()
    result = calcfunc.execute(workunit['args'])
    distributor.log_event('log',provider.name,'Finished Processing 
	                                                   Workunit ID %s' % (item))
    workunit['result'] = result
    gridspec.workunitqueue.move(workunit['_workunitid'],'active','complete')
    gridspec.resultqueue.add(workunit,'queue',workunit['_workunitid'])

def put_results(items):
    for resultid in items:
        workunit = gridspec.resultqueue.get(resultid,'queue')
        gridspec.resultqueue.move(resultid,'queue','active')
        try:
            distributor.putresult(workunit)
        except:
            gridspec.resultqueue.move(resultid,'active','queue')
            return
        gridspec.resultqueue.move(resultid,'active','complete')

while 1:
    # first thing we need to do is check if there is anything 
    # in our queue that needs calculating
    items = provider.grid.workunitqueue.list('queue')
    if len(items) > 0:
        run_calculation(items[0])
    else:
        count = get_workunits()
        if count == 0:
            distributor.log_status(provider.name,0)
            time.sleep(60)
        continue
    items = provider.grid.resultqueue.list('queue')
    if len(items) > 0:
        put_results(items)
		

Although this looks daunting, you can see the core elements that execute the grid components:

  • get_workunits() contacts the distribution node and gets a list of outstanding work units.
  • run_calculation() extracts the Python module required by the calculation, saves the module into an external file, loads the file ready to execute the embedded calculation method, then executes it, adding the calculation results to the provider's internal queue.
  • put_results() takes the completed results and submits them to the distribution node.

To demonstrate how speedy the development can be using this Python or other open source solutions, the entire Python-based grid solution is slightly more than 300 lines of code and is flexible enough to handle a variety of execution modules and results, including registration, distribution, and data collection elements.



Back to top


Open source libraries and environments

I've already described how the benefits of the open source tools allow for the editing and modification of the original code to suit your needs and requirements, but often this isn't necessary. The tools and functionality you need are often already available, not because they have been built into the original solution but because the solutions are often more flexible and extensible to begin with.

Before we look at some of the major environments on offer, the basic libraries and extensions offered by many of the scripting languages should not be ignored. Python and PHP come with a large range of standard modules and functionality that can be used out of the box. There is also a huge range of third-party extensions. Perl has the well-known Comprehensive Perl Archive Network (CPAN) library of modules.

There are several environments built on open source technology that combine the functionality of many different components into a system that can be used to develop and deploy different types of applications. Although not all of the open source environments described here are specifically targeted at developing grid solutions, most of them can be adapted, modified, or used as the base for developing grid-based solutions.



Back to top


LAMP and derivatives

The original LAMP environment referred to the bundling of the Linux operating system, Apache HTTP server, MySQL database, and PHP tool sets used to produce Web sites. Over time, the generic LAMP has spawned a number of derivatives and diversifications. For example, the "P" is often used to additionally to refer Perl- and Python-based environments, or it can be replaced with "R" to produce LAMR for Ruby or Ruby on Rails solutions, and "J" for Java™/JSP environments.

The "L" has been replaced with Windows® (WAMP) and in turn led to WIMP (Windows and IIS), SAMP (Solaris), MAMP (Mac OS X), and even BAMP (BSD). The MySQL has also been replaced with PostgreSQL (LAPP). Commercial environments and applications have also been added, such as acronyms for WASP (Windows, Apache, SQL Server and PHP) and OPAL (Oracle).

The LAMP bundle was designed for the development and deployment of Web sites and used some of the best individual elements into a combination that many of them were already being used for. Although not designed with the development of grids in mind, the combination of these individual tools into a single bundle can make it very easy to develop and deploy a grid-based solution.

Within a grid deployment, we can leverage each tool for handling different parts of a typical grid solution:

  • Apache is a powerful HTTP serving platform that provides a scalable environment for serving up different applications and scripts, and can also be used to provide security and tracking functionality.
  • MySQL, PostgreSQL, and Apache Derby are all highly efficient relational database solutions. Within a typical grid solution, they can be used to provide centralized storage and resource provision.
  • Perl, Python, PHP, Ruby, and others all provide a suitable scripting environment for rapidly developing applications, and can call on extensive and powerful libraries and extensions that provide specific functionality, such as Web services or security.

In addition to allowing for a very rapid development process, using a LAMP or derivative solution allows for a easy and quick deployment process. Most Linux distributions come with the LAMP tool kit standard. Building a grid that uses the LAMP environment can, therefore, often be a straightforward case of installing and setting up the individual Linux-based grid nodes and deploying your grid application scripts to the nodes.



Back to top


Apache's Java tool kits

Although Java itself is not strictly open source, a huge range of open source tool kits and libraries have been built on top of Java technology to provide core functionality, such as Web serving and Web services, and also to produce some grid specific tool kits. Key among the different tool kits available are those from the Apache Software Foundation.

Apache has produced a suite of tools we can use when building grid environments:

  • Apache Tomcat is a Java-based Web-serving environment that can be used to share basic files and Web-based solutions for Java applications, such as JSP and Java servlets. Tomcat is often the basic serving environment used for many of the other tools in the Apache Java tool kits, which are themselves based on the same JSP/servlet technology.
  • Axis is an implementation of the SOAP Web-services solution, and Axis2 is a redevelopment that not only supports SOAP but also other solutions, such as Representational State Transfer (REST). The Axis2 tool kit includes a number of key Web-services implementations, such as WS-Reliable Messaging, WS-Coordination, WS-Security, and WS-Addressing, all of which are seen as key in the modern Web services-based grid environment.
  • Apache Muse is an implementation of the WS-ResourceFramework, WS-BaseNotification, and the WS-DistributedManagement standards, also key elements of many Web services-based grid environments. Both Muse and Axis/Axis2 rely on the Web-serving environment of Tomcat to provide the core HTTP protocol.

By combining many of these services and by using the functionality offered by the different Web services-based solutions, you can easily build a Web services-based grid solution. See Resources for a list of tutorials, articles, and solutions that have made the various Apache tool kits to build grid solutions.

All of the above solutions are open source, although they are strongly controlled and guided by the Apache Software Foundation, a group of developers and architects that ensure the individual projects keep their target and focus.



Back to top


ActiveGrid

The ActiveGrid system is not, as the name suggests, a grid-specific development and deployment environment. Nor is it truly open source. Instead, ActiveGrid is an environment that makes use of the LAMP tool kits to develop applications that can make use of grid-style technology to allow for the development of highly distributed business applications.

The system relies on two components: the ActiveGrid Application Builder for designing and developing the application and the ActiveGrid Application Server that can be used to deploy the application.

The focus is on developing full-blown applications that take advantage of the grid environment, rather than providing smaller, scalable components you can use to develop smaller, discrete applications.



Back to top


Summary

We've looked at the advantages and disadvantages of developing a grid solution using open source technology. The key items to take away from the examples and information in this article should be the ease of use and speed of development. Through a combination of the rapid development environments offered by scripting languages and the many open source libraries and other solutions, it can be very quick and easy to produce a grid-based solution.

Both the Perl and the Python solutions detailed in this article are extracts from larger series on developing grid solutions (see Resources).



Resources

Learn
  • The five-part tutorial series "Building a grid with Perl" creates and entire grid solution using Perl as the core scripting and development environment.

  • Grid and related technologies is a good overview paper on grid computing from the Norwegian Computing Center.

  • To listen to interesting interviews and discussions for software developers, check out check out developerWorks podcasts.

  • Stay current with developerWorks' Technical events and webcasts.

  • Check out upcoming conferences, trade shows, webcasts, and other Events around the world that are of interest to IBM open source developers.


Get products and technologies
  • Download IBM product evaluation versions, and get your hands on application development tools and middleware products from DB2®, Lotus®, Rational®, Tivoli®, and WebSphere®.

  • Innovate your next development project with IBM trial software, available for download or on DVD.

Discuss


About the author

Photo of Martin Brown

Martin Brown has been a professional writer for over seven years. He is the author of numerous books and articles across a range of topics. His expertise spans myriad development languages and platforms -- Perl, Python, Java™, JavaScript, Basic, Pascal, Modula-2, C, C++, Rebol, Gawk, Shellscript, Windows, Solaris, Linux®, BeOS, Mac OS/X and more -- as well as Web programming, systems management, and integration. Martin is a Subject Matter Expert (SME) for Microsoft® and regular contributor to ServerWatch.com, LinuxToday.com, and IBM developerWorks. He is also a regular blogger at Computerworld, The Apple Blog, and other sites. You can contact him through his Web site at: http://www.mcslp.com.




Rate this page


Please take a moment to complete this form to help us better serve you.



YesNoDon't know
 


 


12345
Not
useful
Extremely
useful
 


Back to top