 | Level: Intermediate Martin Brown (mc@mcslp.com), Freelance Writer, Consultant
11 Dec 2007 In this five-part "Building
a grid system using WS-Resource Transfer" series, we look at the use
of WS-Resource Transfer (WS-RT) in different areas of the grid environment —
from using it as a method for storing and recovering general information about
grid-to-grid monitoring to management and security. We also examine how WS-RT can
be used for the distribution and division of work. In any grid, there is a huge
amount of metadata about the grid that needs
to be stored and distributed. Using WS-RT makes sharing the information, especially
the precise information required by different systems in the grid, significantly
easier. Here in Part 5, we look at using WS-RT for work distribution.
About this series
The WS-RT standard provides a new method for accessing and exchanging information about
resources between components. It is designed to enhance the WS-Resource
Framework (WSRF) and build on the WS-Transfer standards. The WS-RT system extends previous
resource solutions for Web services and makes it easy not only to access resource
information by name but also to access individual elements of a larger data set through
the same mechanisms by exposing elements of an XML data set through the Web services interfaces.
In this "Building
a grid system using WS-Resource Transfer" series, we look at the use of WS-RT
in different areas of the grid environment, from using it as a method for storing and
recovering general information about the grid to
grid monitoring and management, and security. We also examine how WS-RT can be used
for the distribution and division of work. Series details:
-
Part 1 examines the WS-RT standard and looks at how to develop a WS-RT solution using
Java™ technology and Apache Muse.
-
Part 2 begins using Apache Muse and the WS-RT specification to develop a WS-RT system
to manage and access grid information.
-
Part 3 looks at how to collect monitor data, how to expose the monitor data through
WS-RT, and how to use WS-RT to extract trend information to help make predictions.
-
Part 4 looks at both sides of the security session, in terms of using WS-RT as an aid
to the authorization process and at combining WS-Security with WS-RT for secure resource exchange.
- Part 5 looks at using WS-RT for work distribution.
Distribution methods
There are many methods of distributing work, and the method you choose will depend
mostly on your grid type. Storage grids obviously have specific requirements that are
probably beyond the sensible capabilities of WS-RT, but distributing work units or
reference information within a computational grid is a possibility.
Grid work distribution
Normally, in a grid distribution system, the process is active — that is,
when submitting work to the grid management node, the grid management node then
distributes the information directly to the grid nodes for processing and ultimately
receives the results. You can see an example of this in Figure 1.
Figure 1. Typical work distribution
For many grid solutions, this is fine, but it relies on the grid nodes being active.
They have to be able to respond to the grid node distribution manager that is sending
out the work. And sometimes this can introduce a complexity layer into the grid
application that you simply don't want. The active solution can also present its own
distribution issues in terms of correctly allocating work to individual nodes.
For work that is nonlinear, or within grids where the available resource is variable
(for example, where you don't have control over a single variety or configuration of
grid node hardware), the direct distribution method can cause performance problems. The
grid nodes have to have a way of responding to the request and refusing the work unit,
or they refuse to answer the request and, instead, the distribution process must
time-out. In either case, you have the issue of reassigning the work unit to another grid node.
This active solution also causes problems in the event of a failure. If the
distribution node cannot locate the grid node it needs to contact in order to process
the unit, you encounter the same problem: The distribution of the work unit
fails and has to be reassigned and reallocated.
Figure 2. Work distribution in a failure situation
The solution is a passive distribution solution, in which, instead of forcing the data on
to the grid nodes, you allow the grid nodes to pick up the work when they are ready.
This requires a storage solution where you can create the work and allow the grid nodes
to come and retrieve any waiting work units.
Using WS-RT to store work
We can use the functionality of the WS-RT system to store the work into the system by
getting the grid node to generate the work, then storing the work in a WS-RT
resource store so the grid nodes can pick up the data.
Figure 3. Using WS-RT to distribute information
Using WS-RT provides a number of advantages over other solutions:
- WS-RT is open and easy to use, making it an ideal way of picking up the work and submitting it back.
- If you are already the WS-* systems, accessing an existing WS-RT store saves
you from developing a new type of store. In many cases, you may even already be using
WS-RT to store and record other information in the grid, so integrating your work
submission and result mechanisms is actually easier.
- We can use the dialect systems in WS-RT to pick out specific work units according to
the facilities and abilities of the grid node and the requirements of the distribution
manager. For example, if you provide a grid that is shared by a number of projects,
teams, or clients, you can easily control the distribution of work by embedding the
project information into the WS-RT resource and configuring different grid nodes to
work on different projects in different priorities.
- Because the system works through collection, and through the dialect system, we can
make the grid nodes more autonomous; they can make decisions about which work units to
pick and process and they can be self-managing. For example, if the grid nodes are not
dedicated, they can select a different range of work units that are more easily
processed and yet still devote reasonable quantities of time to the grid.
- We can take advantage of the WS-RT metadata. For example, your work units might be
time-sensitive (you need a response within a limited period of time, for example, or
maybe you are calculating different possibilities or combinations and need to get as
many as possible within a short period of time). Here, you can set the lifetime
metadata of the WS-RT resources that you create and have them automatically expire out
of the work queue once your time limit has been reached.
When distributing work through WS-RT in this way, you need to think about the content
of the work-unit information you will distribute and even what you distribute.
Normally,
you would just use the WS-RT system to distribute the input parameters or data required
by the system. If we make the WS-RT a generic storage mechanism for holding all sorts
of information, we can probably be flexible and start to store more explicitly information and even entire applications.
Distributing parameters
For a computational grid, the information you distribute is often smaller, more
discrete data, such as the input parameters and source information for an existing computation.
Within WS-RT, you "distribute" the information by creating a resource record on a WS-RT
host that contains the information about the work unit you want to process. For
example, you would include the input parameters you would normally have distributed
directly to the grid node and make them tags within your XML.
Because the grid nodes will be picking up the information, rather than being sent the
information, you should also provide a reasonable amount of other information about the
work unit that describes the unit and its contents so the grid node can decide which
work units to select. For example, you may want to embed information about the job, such
as supported platforms, expected execution times and application or class required to
actually execute the work. You can see an example of this in Listing 1.
Listing 1. Embedding information about the job
<?xml version="1.0" encoding="UTF-8"?>
<workunit id="999" status="unprocessed">
<priority>100</priority>
<requirements>
<processclass>Combinations</processclass>
<cputype>Intel</cputype>
</requirements>
<parameters>
<input>1</input>
<input>3</input>
<input>11</input>
<iterations>64</iterations>
</parameters>
</workunit>
|
In this case, the work unit contains information about the parameters to be applied,
the class to be used and the CPU type required for the processing.
For that last item, be aware that the classes available on a system will affect the
system's ability to actually process the work units. But we can use the WS-RT system to
help here by using it to distribute code, in addition to the work units.
Distributing code
Another alternative that may be more practical within a computational grid where you
want the system to be more flexible is to actually provide code or precompiled code
components through the resource system. In this case, you can create the work units
with the embedded information within the work-unit fragment along with the parameters,
or, more practically, you can create stand-alone WS-RT resources that contain the information.
This latter solution actually has a longer-term benefit, in that you can create a type
of library of code execution environments (programs) that can be used and applied to
any work unit. All the grid node has to do is load the work unit, determine the class or
program required to execute the unit, then access the information from the WS-RT store.
Figure 4. Sequence using WS-RT for work-unit and program distribution
If you are going to distribute code in this way, you should also determine whether you
want to distribute the source code or the compiled program. Using the source code has
benefits in that it can be more easily distributed to nodes, but those nodes then have
the requirement to correctly build the application before execution. You may
also have timing issues (generally, you want your grid node to start working right away,
not to have to wait for a few minutes before processing its work unit).
The flip side to that is when using precompiled applications, you have to be sure that
the application can run on your destination nodes. In a homogeneous network, this will
not be a problem. In a heterogeneous environment, however, this can become a headache.
WS-RT can help, at least in the selection, by allowing you to embed different
applications into the XML, then allowing the fragment-dialect selection process to
decide which fragment to use according to the grid node's environment.
Regardless of whether you are using source code or compiled code, you should wrap the
actual code up into a <![CDATA[]]> block with your XML
code, and if the code is binary (compiled), consider using a text-encoding mechanism,
such as base64, to ensure that the code remains intact when processed by different XML
systems. For example, in the sample in Listing 2, we have precompiled code ready to
execute for a number of platforms and environments.
Listing 2. Precompiled code ready to execute
<?xml version="1.0" encoding="UTF-8"?>
<WorkUnitProcessor id="PermutationProcessor">
<description>Determines the different permutations
of the input parameters</description>
<code language="JavaClass" platform="any">
<![CDATA[dXNlIE1JTUU6OkJhc2U2NDsKCm9wZW4oRklMR
SwkQVJHVlswXSk7CgpteSAkZmlsZSA9IGpvaW4o
...
JycsPEZJTEU+KTsKCmNsb3NlKEZJTEUpOwoKcHJ
pbnQgZW5jb2RlX2Jhc2U2NCgkZmlsZSk7Cg==]]></code>
<code language="Compiled" platform="i386-linux">
<![CDATA[bCBwcmltYXJ5IGtleSxjb250ZXh0IGNoYXIoM
jU1KSkgZW5naW5lPU1lbW9yeSIpOwoKICAgIHBy
...
aW50ICJDcmVhdGluZyB0YWJsZSBhbmQgaW5pdGl
hbCBkYXRhXG4iOwogICAgcHJpbnQgIk1lYXN1]]>
</code>
</WorkUnitProcessor>
|
Distribution management
Distributing work information through WS-RT requires a slight change in the way
you may normally send work through a grid system. Not only is the information different
but the way you create, process, and return the work is also different.
Creating the work
Inserting work into our WS-RT-based distribution system is a process of creating a
large number of resource entries into our WS-RT host. You are effectively using the
WS-RT host as the queue for the work. Because we need to think about how to get the
information back out again so we can actually process it, some thought needs to be
given to the structure and contents. In general, you should include the following information in your work units:
- Unique ID
- Job/project information (if relevant)
- The program or class required to process the work unit
- The input parameters
- Any timing or priority information
Information within any WS-RT system is generally treated on a resource-by-resource
basis. That is, if you access information stored within a WS-RT system, the usual
method for picking out the information is to select the resource by a unique ID. You
use the unique ID to access the resource and to remove the work unit from the notional queue.
You can either treat each work unit as its own resource or you can collect work units
together into a single larger collection and process them individually. The former
approach (one unit per resource entry) has the benefit that you can pick out each
resource by a unique ID, but if you are using one of the dialect fragments, such as
XPath, the individual ID is less important than being able to pick out a single unit.
However, you should also consider the life cycle of the work unit. When using WS-RT,
you need to consider how to record the status of the unit. For now, think about the
typical life cycle:
- Grid manager creates unit with unprocessed status.
- Grid node retrieves unit and updates status to processing.
- Grid node completes unit and updates status to processed.
- Grid node either deletes the unit or appends result information to the unit and lets
the grid manager delete the unit during the collection process.
Having a unique ID to make the unit individually identifiable in this situation will make your life easier.
Making the work units usable
You want to be able to update — and even delete — work
units, so work units should be uniquely selectable. With XPath, you can do this by
specifying matches by their location within the overall return information. However, if
the work units are processed by multiple grid nodes, it's possible that different grid nodes
may use and update different parts of the original resource list. For example, Listing
3 shows three work units.
Listing 3. Three work units
<?xml version="1.0" encoding="UTF-8"?>
<workunits>
<workunit id="313" status="unprocessed">
<priority>100</priority>
<requirements>
<processclass>Combinations</processclass>
<cputype>Intel</cputype>
</requirements>
<parameters>
<input>1</input>
<input>3</input>
<input>11</input>
<iterations>64</iterations>
</parameters>
</workunit>
<workunit id="314" status="unprocessed">
<priority>100</priority>
<requirements>
<processclass>Combinations</processclass>
<cputype>Intel</cputype>
</requirements>
<parameters>
<input>5</input>
<input>9</input>
<input>27</input>
<iterations>64</iterations>
</parameters>
</workunit>
<workunit id="315" status="unprocessed">
<priority>100</priority>
<requirements>
<processclass>Combinations</processclass>
<cputype>Intel</cputype>
</requirements>
<parameters>
<input>17</input>
<input>19</input>
<input>37</input>
<iterations>64</iterations>
</parameters>
</workunit>
</workunits>
|
We could select the work units using an XPath expression that selects the unprocessed
elements: //workunit[@status="unprocessed"]. We've seen
examples of retrieving information through WS-RT using XPath dialects in
other parts of this series.
Updating the status during processing means we can tell the processed work units from
the unprocessed work units, but doesn't solve the situation where a work unit can be
part processed and a problem prevents grid node from returning the unit.
An alternative solution is to update the status of the original work-unit resource and
create a new work-unit resource that contains the work unit's unique ID, the status,
then uses the WS-RT lifetime metadata to set the expected processing time of the unit.
If the unit is completed within the lifetime given to the new resource entry, we can
reset the grid node status to unprocessed, and the next grid node that accesses the
list can pick up the unit to be processed as normal.
Monitoring status and reporting results
Because we are storing information about the status of processing different work units
within the grid within a single storage location (the WS-RT system), it becomes much
easier to determine the current status of the grid.
We can get a count of the current work units being processed and even a list of the
unprocessed and processed units from the WS-RT repository, giving us an instant view of
the status of the processing without having to query individual grid nodes for the
information. By executing an XPath query on the WS-RT repository, we can determine all the information we need.
Finally, we have a centralized location for submitting work and information back into
the system. To submit a response, the grid node just updates the WS-RT entry for the
original work unit. To collate the results, we query the WS-RT repository with another
XPath query to retrieve the information and determine the results.
In many ways, the WS-RT system resolves many of the issues of distributing and
collating information into a single location and makes retrieving that information a
case of generating the XPath query that extracts the information.
Performance and limitations
Using WS-RT simplifies the processes and complexities of the typical grid system. A
single location for the work units, a centralized repository for storing status and
progress information (implied by the status of the work unit), and an easily collated
location for the responses all make for easier processing and distribution. But the
WS-RT solution generates some potential performance pitfalls.
Centralized resource problems
The biggest problem with using WS-RT for our distribution system is that it can
represent a significant load on the system that handles the resource information. This
is simply because the system becomes a single point of contention and may be
simultaneously communicating with a large number of hosts.
As the number of grid hosts increases, the loading on your WS-RT will increase
linearly. The problem with WS-RT is that the XML approach means that processing and
returning information to a WS-RT client is sometimes heavier than the direct distribution method used in other solutions.
This problem isn't limited to grid solutions that use WS-RT. Any grid solution will
eventually hit the resource-limitation issue. With WS-RT, the main issue is the way the
information is recovered from the resource information. To return the right
information, the XML generated by the WS-RT system has to be post-processed to extract the information through XPath.
XML processing and using XPath are comparatively resource-hungry when you compare them
against other solutions, such as a shared database. Where WS-RT wins is the simplified
way in which can obtain information and how we can more easily store, and recover
structured and, in some cases, deeply nested information through the XML structure.
To make the most of resources in this situation, a more effective solution is required
that splits up the load among multiple hosts. This almost breaks the approach of WS-RT
as a single resource, so that means we also need better organization of the WS-RT hosts to support a more distributed model.
Using a distributed resource model
The solution to the resource and loading problem is to use a distributed resource
model. This divides the information so it is spread across multiple WS-RT-capable
hosts. You can then split your grid system into a number of clusters. When you
distribute the work units, the information is placed on multiple WS-RT hosts, then
distinct groups of individual hosts access their WS-RT repositories to access the
information. You can see a sample of this in Figure 5.
Figure 5. Distributed WS-RT systems and grid clusters
This solution is not without its own problems and issues. What happens if a grid node
— and, more significantly, an entire grid cluster — runs out of work?
Rather than thinking in terms of clusters (many grid nodes to one WS-RT host and
multiple WS-RT hosts to the grid manager), think of distributed WS-RT hosts and grid
nodes that are aware of more than one WS-RT host to contact.
To use a distribute system in this situation, you use different parts of the system
appropriately: On the grid node manager, when generating work, you equally distribute information to
the WS-RT hosts. You can do this either using a round-robin solution, where each unit
(or batches of units) are created on the WS-RT hosts, or you use an XPath query that
determines the current queue size on each WS-RT host and delivers a suitable number of
work units for each.
For each grid node, you store a list of WS-RT hosts capable of supplying work units.
Each grid node has a preferred WS-RT host so nodes can still be collated in terms of
clusters. In addition, each node also has one or more secondary WS-RT hosts that can be
queried for work units to be processed. You can see a sample of this in Figure 6.
Figure 6. Grid clusters with multiple WS-RT hosts
The solution has a number of benefits. It should keep your grid busy, even in the
event that a single WS-RT host runs out of work units to distribute. It also provides
an additional level of resilience by ensuring that a grid node is able to pick up work
in a system failure or a failure to communicate with a WS-RT host due to the resource load on the query.
Article summary
The flexible nature of WS-RT means that it can be a practical solution for the
distribution of work. By writing information about the work units as resources into a
WS-RT system, you effectively create a queue of information. Using XPath or QName
dialects, we can recover the list of unprocessed work units and generate statistics
about the current status of the work-unit processing.
Using WS-RT, we can update the status of the work unit to reflect the fact that it is
being processed, or that the process has been completed, all using the same
fundamentals of an XML structure to hold the work-unit information and the dialect
interface to update and set information within the XML structure.
Using a complex XML structure, we can even use WS-RT as a mechanism not only for
distributing work units but also for distributing the programs or classes used to work
on them. All of it is possible using some basic XML structures and XPath queries through the WS-RT API.
Series summary
This concludes the five-part "Building
a grid system using WS-Resource Transfer" series. Let's revisit some key
elements of the WS-RT system and how we've used it throughout the series to work as a
flexible solution for different grid solutions.
The key to the WS-RT system is the flexible method with which we can create and recover
information within its repository. Technically, WS-RT is not seen as a general-purpose
solution for the storage and recovery of information, but, in fact, the XML structure
and the ease with which we can process information by using the QName and XPath
dialects to extract and update the information makes it a flexible and
easy-to-manipulate system for information storage and distribution.
It can be used on a number of levels, as we've seen throughout the series, from the
fundamentals of information storage to the organization and definition of security
information, and for the distribution of work throughout the grid system. Using the
flexible nature of WS-RT makes the distribution of work easy and allows us to bypass
some of the problems and limitations that exist in other grid systems.
Resources Learn
-
Read the entire "Building a
grid system using WS-Resource Transfer" series.
-
See what Ian Foster has to say about the release of the WS-RT specification in his blog
entry titled "WS-ResourceTransfer Specification Released."
-
A researcher in grid computing shares his experiences with WSRF in his Bruno's Blog entry
titled "Experiences with
WSRF."
-
Ian Robinson, the editor of the published WS-RT specification, gives an update on the
WS-RT specification in his blog entry titled "WS-ResourceTransfer update."
-
For a good introduction to the WS-Resource Transfer specification, read "Meet the specs:
Intro to WS-ResourceTransfer 1.0," "Meet the specs: WS-RT 1.0 operations, Part One" and "WS-RT 1.0
operations, Part One," Part Two, and Part Three.
-
See this grid series on building a grid more focused on WSRF: "Building a grid
using Web services standards."
-
The developerWorks SOA and Web
services zone provides guides and tutorials on Web services and Web services standards.
-
Browse all the grid computing content on developerWorks.
-
To listen to interesting interviews and discussions for software developers, check out check out developerWorks podcasts.
-
Stay current with developerWorks' Technical events and webcasts.
-
Check out upcoming conferences, trade shows, webcasts, and other Events around the world that are of interest to IBM open source developers.
Get products and technologies
-
Rampart
is a security extension to Axis2, providing a WS-Security compliant implementation.
-
Get Apache Muse, voted to graduate from the Apache incubator in 2005, from Apache.
-
Check out the Interoperability Demo for
Converged Web Service Management Specifications, highlighting interoperability
between WS-RT and WS-RP using a single XSLT stylesheet.
-
Check out the full WS-Transfer
specification at W3.org.
-
Check out the full WS-RT
specification at xmlsoap.org.
-
See links for the WSDL and the schema for WS-RT at xmlsoap.org.
-
Download IBM product evaluation versions, and get your hands on application development tools and middleware products from DB2®, Lotus®, Rational®, Tivoli®, and WebSphere®.
-
Innovate your next development project with IBM trial software, available for download or on DVD.
Discuss
About the author  | |  | Martin Brown has been a professional writer for over eight years. He is the author of numerous books and articles across a range of topics. His expertise spans myriad development languages and platforms -- Perl, Python, Java, JavaScript, Basic, Pascal, Modula-2, C, C++, Rebol, Gawk, Shellscript, Windows, Solaris, Linux, BeOS, Mac OS/X and more -- as well as Web programming, systems management and integration. Martin is a regular contributor to ServerWatch.com, LinuxToday.com and IBM developerWorks, and a regular blogger at Computerworld, The Apple Blog and other sites, as well as a Subject Matter Expert (SME) for Microsoft. He can be contacted through his Web site at http://www.mcslp.com. |
Rate this page
|  |