Skip to main content

skip to main content

developerWorks  >  SOA and Web services | Grid computing  >

Building a grid system using WS-Resource Transfer, Part 5: Using WS-RT for work distribution

developerWorks
Document options

Document options requiring JavaScript are not displayed


Rate this page

Help us improve this content


Level: Intermediate

Martin Brown (mc@mcslp.com), Freelance Writer, Consultant

11 Dec 2007

In this five-part "Building a grid system using WS-Resource Transfer" series, we look at the use of WS-Resource Transfer (WS-RT) in different areas of the grid environment — from using it as a method for storing and recovering general information about grid-to-grid monitoring to management and security. We also examine how WS-RT can be used for the distribution and division of work. In any grid, there is a huge amount of metadata about the grid that needs to be stored and distributed. Using WS-RT makes sharing the information, especially the precise information required by different systems in the grid, significantly easier. Here in Part 5, we look at using WS-RT for work distribution.

About this series

The WS-RT standard provides a new method for accessing and exchanging information about resources between components. It is designed to enhance the WS-Resource Framework (WSRF) and build on the WS-Transfer standards. The WS-RT system extends previous resource solutions for Web services and makes it easy not only to access resource information by name but also to access individual elements of a larger data set through the same mechanisms by exposing elements of an XML data set through the Web services interfaces.

In this "Building a grid system using WS-Resource Transfer" series, we look at the use of WS-RT in different areas of the grid environment, from using it as a method for storing and recovering general information about the grid to grid monitoring and management, and security. We also examine how WS-RT can be used for the distribution and division of work. Series details:

  • Part 1 examines the WS-RT standard and looks at how to develop a WS-RT solution using Java™ technology and Apache Muse.
  • Part 2 begins using Apache Muse and the WS-RT specification to develop a WS-RT system to manage and access grid information.
  • Part 3 looks at how to collect monitor data, how to expose the monitor data through WS-RT, and how to use WS-RT to extract trend information to help make predictions.
  • Part 4 looks at both sides of the security session, in terms of using WS-RT as an aid to the authorization process and at combining WS-Security with WS-RT for secure resource exchange.
  • Part 5 looks at using WS-RT for work distribution.


Back to top


Distribution methods

There are many methods of distributing work, and the method you choose will depend mostly on your grid type. Storage grids obviously have specific requirements that are probably beyond the sensible capabilities of WS-RT, but distributing work units or reference information within a computational grid is a possibility.

Grid work distribution

Normally, in a grid distribution system, the process is active — that is, when submitting work to the grid management node, the grid management node then distributes the information directly to the grid nodes for processing and ultimately receives the results. You can see an example of this in Figure 1.


Figure 1. Typical work distribution
Typical work distribution

For many grid solutions, this is fine, but it relies on the grid nodes being active. They have to be able to respond to the grid node distribution manager that is sending out the work. And sometimes this can introduce a complexity layer into the grid application that you simply don't want. The active solution can also present its own distribution issues in terms of correctly allocating work to individual nodes.

For work that is nonlinear, or within grids where the available resource is variable (for example, where you don't have control over a single variety or configuration of grid node hardware), the direct distribution method can cause performance problems. The grid nodes have to have a way of responding to the request and refusing the work unit, or they refuse to answer the request and, instead, the distribution process must time-out. In either case, you have the issue of reassigning the work unit to another grid node.

This active solution also causes problems in the event of a failure. If the distribution node cannot locate the grid node it needs to contact in order to process the unit, you encounter the same problem: The distribution of the work unit fails and has to be reassigned and reallocated.


Figure 2. Work distribution in a failure situation
Work distribution in a failure situation

The solution is a passive distribution solution, in which, instead of forcing the data on to the grid nodes, you allow the grid nodes to pick up the work when they are ready. This requires a storage solution where you can create the work and allow the grid nodes to come and retrieve any waiting work units.

Using WS-RT to store work

We can use the functionality of the WS-RT system to store the work into the system by getting the grid node to generate the work, then storing the work in a WS-RT resource store so the grid nodes can pick up the data.


Figure 3. Using WS-RT to distribute information
Using WS-RT to distribute information

Using WS-RT provides a number of advantages over other solutions:

  • WS-RT is open and easy to use, making it an ideal way of picking up the work and submitting it back.
  • If you are already the WS-* systems, accessing an existing WS-RT store saves you from developing a new type of store. In many cases, you may even already be using WS-RT to store and record other information in the grid, so integrating your work submission and result mechanisms is actually easier.
  • We can use the dialect systems in WS-RT to pick out specific work units according to the facilities and abilities of the grid node and the requirements of the distribution manager. For example, if you provide a grid that is shared by a number of projects, teams, or clients, you can easily control the distribution of work by embedding the project information into the WS-RT resource and configuring different grid nodes to work on different projects in different priorities.
  • Because the system works through collection, and through the dialect system, we can make the grid nodes more autonomous; they can make decisions about which work units to pick and process and they can be self-managing. For example, if the grid nodes are not dedicated, they can select a different range of work units that are more easily processed and yet still devote reasonable quantities of time to the grid.
  • We can take advantage of the WS-RT metadata. For example, your work units might be time-sensitive (you need a response within a limited period of time, for example, or maybe you are calculating different possibilities or combinations and need to get as many as possible within a short period of time). Here, you can set the lifetime metadata of the WS-RT resources that you create and have them automatically expire out of the work queue once your time limit has been reached.

When distributing work through WS-RT in this way, you need to think about the content of the work-unit information you will distribute and even what you distribute. Normally, you would just use the WS-RT system to distribute the input parameters or data required by the system. If we make the WS-RT a generic storage mechanism for holding all sorts of information, we can probably be flexible and start to store more explicitly information and even entire applications.

Distributing parameters

For a computational grid, the information you distribute is often smaller, more discrete data, such as the input parameters and source information for an existing computation.

Within WS-RT, you "distribute" the information by creating a resource record on a WS-RT host that contains the information about the work unit you want to process. For example, you would include the input parameters you would normally have distributed directly to the grid node and make them tags within your XML.

Because the grid nodes will be picking up the information, rather than being sent the information, you should also provide a reasonable amount of other information about the work unit that describes the unit and its contents so the grid node can decide which work units to select. For example, you may want to embed information about the job, such as supported platforms, expected execution times and application or class required to actually execute the work. You can see an example of this in Listing 1.


Listing 1. Embedding information about the job
                
<?xml version="1.0" encoding="UTF-8"?>
<workunit id="999" status="unprocessed">
  <priority>100</priority>
  <requirements>
    <processclass>Combinations</processclass>
    <cputype>Intel</cputype>
  </requirements>
  <parameters>
    <input>1</input>
    <input>3</input>
    <input>11</input>
    <iterations>64</iterations>
  </parameters>
</workunit>

In this case, the work unit contains information about the parameters to be applied, the class to be used and the CPU type required for the processing.

For that last item, be aware that the classes available on a system will affect the system's ability to actually process the work units. But we can use the WS-RT system to help here by using it to distribute code, in addition to the work units.

Distributing code

Another alternative that may be more practical within a computational grid where you want the system to be more flexible is to actually provide code or precompiled code components through the resource system. In this case, you can create the work units with the embedded information within the work-unit fragment along with the parameters, or, more practically, you can create stand-alone WS-RT resources that contain the information.

This latter solution actually has a longer-term benefit, in that you can create a type of library of code execution environments (programs) that can be used and applied to any work unit. All the grid node has to do is load the work unit, determine the class or program required to execute the unit, then access the information from the WS-RT store.


Figure 4. Sequence using WS-RT for work-unit and program distribution
Sequence using WS-RT for work-unit and program distribution

If you are going to distribute code in this way, you should also determine whether you want to distribute the source code or the compiled program. Using the source code has benefits in that it can be more easily distributed to nodes, but those nodes then have the requirement to correctly build the application before execution. You may also have timing issues (generally, you want your grid node to start working right away, not to have to wait for a few minutes before processing its work unit).

The flip side to that is when using precompiled applications, you have to be sure that the application can run on your destination nodes. In a homogeneous network, this will not be a problem. In a heterogeneous environment, however, this can become a headache.

WS-RT can help, at least in the selection, by allowing you to embed different applications into the XML, then allowing the fragment-dialect selection process to decide which fragment to use according to the grid node's environment.

Regardless of whether you are using source code or compiled code, you should wrap the actual code up into a <![CDATA[]]> block with your XML code, and if the code is binary (compiled), consider using a text-encoding mechanism, such as base64, to ensure that the code remains intact when processed by different XML systems. For example, in the sample in Listing 2, we have precompiled code ready to execute for a number of platforms and environments.


Listing 2. Precompiled code ready to execute
                
<?xml version="1.0" encoding="UTF-8"?>
<WorkUnitProcessor id="PermutationProcessor">
  <description>Determines the different permutations 
    of the input parameters</description>
  <code language="JavaClass" platform="any">
    <![CDATA[dXNlIE1JTUU6OkJhc2U2NDsKCm9wZW4oRklMR
    SwkQVJHVlswXSk7CgpteSAkZmlsZSA9IGpvaW4o
    ...
    JycsPEZJTEU+KTsKCmNsb3NlKEZJTEUpOwoKcHJ
    pbnQgZW5jb2RlX2Jhc2U2NCgkZmlsZSk7Cg==]]></code>
  <code language="Compiled" platform="i386-linux">
    <![CDATA[bCBwcmltYXJ5IGtleSxjb250ZXh0IGNoYXIoM
    jU1KSkgZW5naW5lPU1lbW9yeSIpOwoKICAgIHBy
    ...
    aW50ICJDcmVhdGluZyB0YWJsZSBhbmQgaW5pdGl
    hbCBkYXRhXG4iOwogICAgcHJpbnQgIk1lYXN1]]>
  </code>
</WorkUnitProcessor>



Back to top


Distribution management

Distributing work information through WS-RT requires a slight change in the way you may normally send work through a grid system. Not only is the information different but the way you create, process, and return the work is also different.

Creating the work

Inserting work into our WS-RT-based distribution system is a process of creating a large number of resource entries into our WS-RT host. You are effectively using the WS-RT host as the queue for the work. Because we need to think about how to get the information back out again so we can actually process it, some thought needs to be given to the structure and contents. In general, you should include the following information in your work units:

  • Unique ID
  • Job/project information (if relevant)
  • The program or class required to process the work unit
  • The input parameters
  • Any timing or priority information

Information within any WS-RT system is generally treated on a resource-by-resource basis. That is, if you access information stored within a WS-RT system, the usual method for picking out the information is to select the resource by a unique ID. You use the unique ID to access the resource and to remove the work unit from the notional queue.

You can either treat each work unit as its own resource or you can collect work units together into a single larger collection and process them individually. The former approach (one unit per resource entry) has the benefit that you can pick out each resource by a unique ID, but if you are using one of the dialect fragments, such as XPath, the individual ID is less important than being able to pick out a single unit.

However, you should also consider the life cycle of the work unit. When using WS-RT, you need to consider how to record the status of the unit. For now, think about the typical life cycle:

  • Grid manager creates unit with unprocessed status.
  • Grid node retrieves unit and updates status to processing.
  • Grid node completes unit and updates status to processed.
  • Grid node either deletes the unit or appends result information to the unit and lets the grid manager delete the unit during the collection process.

Having a unique ID to make the unit individually identifiable in this situation will make your life easier.

Making the work units usable

You want to be able to update — and even delete — work units, so work units should be uniquely selectable. With XPath, you can do this by specifying matches by their location within the overall return information. However, if the work units are processed by multiple grid nodes, it's possible that different grid nodes may use and update different parts of the original resource list. For example, Listing 3 shows three work units.


Listing 3. Three work units
                
<?xml version="1.0" encoding="UTF-8"?>
<workunits>
  <workunit id="313" status="unprocessed">
    <priority>100</priority>
    <requirements>
      <processclass>Combinations</processclass>
      <cputype>Intel</cputype>
    </requirements>
    <parameters>
      <input>1</input>
      <input>3</input>
      <input>11</input>
      <iterations>64</iterations>
    </parameters>
  </workunit>
  <workunit id="314" status="unprocessed">
    <priority>100</priority>
    <requirements>
      <processclass>Combinations</processclass>
      <cputype>Intel</cputype>
    </requirements>
    <parameters>
      <input>5</input>
      <input>9</input>
      <input>27</input>
      <iterations>64</iterations>
    </parameters>
  </workunit>
  <workunit id="315" status="unprocessed">
    <priority>100</priority>
    <requirements>
      <processclass>Combinations</processclass>
      <cputype>Intel</cputype>
    </requirements>
    <parameters>
      <input>17</input>
      <input>19</input>
      <input>37</input>
      <iterations>64</iterations>
    </parameters>
  </workunit>
</workunits>

We could select the work units using an XPath expression that selects the unprocessed elements: //workunit[@status="unprocessed"]. We've seen examples of retrieving information through WS-RT using XPath dialects in other parts of this series.

Updating the status during processing means we can tell the processed work units from the unprocessed work units, but doesn't solve the situation where a work unit can be part processed and a problem prevents grid node from returning the unit.

An alternative solution is to update the status of the original work-unit resource and create a new work-unit resource that contains the work unit's unique ID, the status, then uses the WS-RT lifetime metadata to set the expected processing time of the unit. If the unit is completed within the lifetime given to the new resource entry, we can reset the grid node status to unprocessed, and the next grid node that accesses the list can pick up the unit to be processed as normal.

Monitoring status and reporting results

Because we are storing information about the status of processing different work units within the grid within a single storage location (the WS-RT system), it becomes much easier to determine the current status of the grid.

We can get a count of the current work units being processed and even a list of the unprocessed and processed units from the WS-RT repository, giving us an instant view of the status of the processing without having to query individual grid nodes for the information. By executing an XPath query on the WS-RT repository, we can determine all the information we need.

Finally, we have a centralized location for submitting work and information back into the system. To submit a response, the grid node just updates the WS-RT entry for the original work unit. To collate the results, we query the WS-RT repository with another XPath query to retrieve the information and determine the results.

In many ways, the WS-RT system resolves many of the issues of distributing and collating information into a single location and makes retrieving that information a case of generating the XPath query that extracts the information.



Back to top


Performance and limitations

Using WS-RT simplifies the processes and complexities of the typical grid system. A single location for the work units, a centralized repository for storing status and progress information (implied by the status of the work unit), and an easily collated location for the responses all make for easier processing and distribution. But the WS-RT solution generates some potential performance pitfalls.

Centralized resource problems

The biggest problem with using WS-RT for our distribution system is that it can represent a significant load on the system that handles the resource information. This is simply because the system becomes a single point of contention and may be simultaneously communicating with a large number of hosts.

As the number of grid hosts increases, the loading on your WS-RT will increase linearly. The problem with WS-RT is that the XML approach means that processing and returning information to a WS-RT client is sometimes heavier than the direct distribution method used in other solutions.

This problem isn't limited to grid solutions that use WS-RT. Any grid solution will eventually hit the resource-limitation issue. With WS-RT, the main issue is the way the information is recovered from the resource information. To return the right information, the XML generated by the WS-RT system has to be post-processed to extract the information through XPath.

XML processing and using XPath are comparatively resource-hungry when you compare them against other solutions, such as a shared database. Where WS-RT wins is the simplified way in which can obtain information and how we can more easily store, and recover structured and, in some cases, deeply nested information through the XML structure.

To make the most of resources in this situation, a more effective solution is required that splits up the load among multiple hosts. This almost breaks the approach of WS-RT as a single resource, so that means we also need better organization of the WS-RT hosts to support a more distributed model.

Using a distributed resource model

The solution to the resource and loading problem is to use a distributed resource model. This divides the information so it is spread across multiple WS-RT-capable hosts. You can then split your grid system into a number of clusters. When you distribute the work units, the information is placed on multiple WS-RT hosts, then distinct groups of individual hosts access their WS-RT repositories to access the information. You can see a sample of this in Figure 5.


Figure 5. Distributed WS-RT systems and grid clusters
Distributed WS-RT systems and grid clusters

This solution is not without its own problems and issues. What happens if a grid node — and, more significantly, an entire grid cluster — runs out of work?

Rather than thinking in terms of clusters (many grid nodes to one WS-RT host and multiple WS-RT hosts to the grid manager), think of distributed WS-RT hosts and grid nodes that are aware of more than one WS-RT host to contact.

To use a distribute system in this situation, you use different parts of the system appropriately: On the grid node manager, when generating work, you equally distribute information to the WS-RT hosts. You can do this either using a round-robin solution, where each unit (or batches of units) are created on the WS-RT hosts, or you use an XPath query that determines the current queue size on each WS-RT host and delivers a suitable number of work units for each.

For each grid node, you store a list of WS-RT hosts capable of supplying work units. Each grid node has a preferred WS-RT host so nodes can still be collated in terms of clusters. In addition, each node also has one or more secondary WS-RT hosts that can be queried for work units to be processed. You can see a sample of this in Figure 6.


Figure 6. Grid clusters with multiple WS-RT hosts
Grid clusters with multiple WS-RT hosts

The solution has a number of benefits. It should keep your grid busy, even in the event that a single WS-RT host runs out of work units to distribute. It also provides an additional level of resilience by ensuring that a grid node is able to pick up work in a system failure or a failure to communicate with a WS-RT host due to the resource load on the query.



Back to top


Article summary

The flexible nature of WS-RT means that it can be a practical solution for the distribution of work. By writing information about the work units as resources into a WS-RT system, you effectively create a queue of information. Using XPath or QName dialects, we can recover the list of unprocessed work units and generate statistics about the current status of the work-unit processing.

Using WS-RT, we can update the status of the work unit to reflect the fact that it is being processed, or that the process has been completed, all using the same fundamentals of an XML structure to hold the work-unit information and the dialect interface to update and set information within the XML structure.

Using a complex XML structure, we can even use WS-RT as a mechanism not only for distributing work units but also for distributing the programs or classes used to work on them. All of it is possible using some basic XML structures and XPath queries through the WS-RT API.



Back to top


Series summary

Share this...

digg Digg this story
del.icio.us Post to del.icio.us
Slashdot Slashdot it!

This concludes the five-part "Building a grid system using WS-Resource Transfer" series. Let's revisit some key elements of the WS-RT system and how we've used it throughout the series to work as a flexible solution for different grid solutions.

The key to the WS-RT system is the flexible method with which we can create and recover information within its repository. Technically, WS-RT is not seen as a general-purpose solution for the storage and recovery of information, but, in fact, the XML structure and the ease with which we can process information by using the QName and XPath dialects to extract and update the information makes it a flexible and easy-to-manipulate system for information storage and distribution.

It can be used on a number of levels, as we've seen throughout the series, from the fundamentals of information storage to the organization and definition of security information, and for the distribution of work throughout the grid system. Using the flexible nature of WS-RT makes the distribution of work easy and allows us to bypass some of the problems and limitations that exist in other grid systems.



Resources

Learn

Get products and technologies

Discuss


About the author

Martin Brown has been a professional writer for over eight years. He is the author of numerous books and articles across a range of topics. His expertise spans myriad development languages and platforms -- Perl, Python, Java, JavaScript, Basic, Pascal, Modula-2, C, C++, Rebol, Gawk, Shellscript, Windows, Solaris, Linux, BeOS, Mac OS/X and more -- as well as Web programming, systems management and integration. Martin is a regular contributor to ServerWatch.com, LinuxToday.com and IBM developerWorks, and a regular blogger at Computerworld, The Apple Blog and other sites, as well as a Subject Matter Expert (SME) for Microsoft. He can be contacted through his Web site at http://www.mcslp.com.




Rate this page


Please take a moment to complete this form to help us better serve you.



YesNoDon't know
 


 


12345
Not
useful
Extremely
useful
 


Back to top