 | Level: Introductory Vladimir Silva (vladimir_silva@yahoo.com), Software Engineer, Consultant
23 May 2006 You are a software developer working hard to get your company's applications rolling in a brand new distributed environment. Your organization has committed resources to the new and exciting field of grid computing, but you're not sure what to do next. Deadlines are closing fast, and a feeling of panic is beginning to fill your team environment. What you need is a quick roadmap to deploy or port your organization's applications into a computing or data grid. Globus has gone far in trying to document its Globus Toolkit V4.0 (GT4), but you need development documentation now. Globus is working hard to overcome this problem by creating development documentation Web sites. Meanwhile, in this article, we present a compilation of development tips in the fields of resource management to get you started enabling grid applications using the Web services Grid Resource Allocation and Management (WS-GRAM) service.
 |
Portability issues between OGSA-GRAM and WS-GRAM
WS-GRAM in GT4 is not compatible with the GRAM interfaces implemented in GT3 (OGSA-MMJFS), although most of these interfaces share the same name: GramJobListener, GramJob, and others. In fact, most of the code of this custom GRAM client is similar to its GT3 counterpart and was built from its code base. GRAM was implemented as a Web Services Resource Framework (WSRF) service in GT4, whereas, it was an OGSA service in GT3.
|
|
WS-GRAM from the ground up
Globus GRAM services provide secure job submission to many types of job schedulers. WS-GRAM was built from the ground up to support XML encryption and signature. It uses digital certificates to send secure XML SOAP messages between client and server. Essentially, GT4 is a secure Web services container. What sets it apart from a traditional container is its ability to maintain state across multiple Simple Object Access Protocol (SOAP) messages (also known as WSRF).
The following tricks demonstrate how to create a custom Java™ technology program to submit a WS-GRAM job to a remote host.
Trick No. 1: Listening for job status changes
To listen for job status changes, implement the interface org.globus.exec.client.GramJobListener, which is not compatible with its GT3 counterpart (and share the same name, by the way). This technique is shown below.
Listing 1. WS-GRAM client base implementation
/**
* A Custom GRAM Client for GT4:
* 1) Based on the GlobusRun command from the GT4 WS-GRAM implementation.
* 2) The GT4 WSRF libraries are required to compile this stuff, plus the
* following VM arguments must be used:
* -Daxis.ClientConfigFile=[GLOBUS_LOCATION]/client-config.wsdd
* -DGLOBUS_LOCATION=[GLOBUS_LOCATION]
* @author Vladimir Silva
*
*/
public class GRAMClient
// Listen for job status messages
implements GramJobListener
{
private static Log logger = LogFactory.getLog(GRAMClient.class.getName());
// Amount of time to wait for job status changes
private static final long STATE_CHANGE_BASE_TIMEOUT_MILLIS = 60000;
/**
* Job submission member variables.
*/
private GramJob job;
// completed if Done or Failed
private boolean jobCompleted = false;
// Batch runs will not wait for the job to complete
private boolean batch;
// Delegation
private boolean limitedDelegation = true;
private boolean delegationEnabled = true;
// Don't print messages by default
private boolean quiet = false;
// proxy credential
private String proxyPath = null;
/**
* Application error state.
*/
private boolean noInterruptHandling = false;
private boolean isInterrupted = true;
private boolean normalApplicationEnd = false;
|
The basic object in WS-GRAM is the GramJob. It's used to submit a job description file in XML format. The job description defines the executable, arguments, standard input/output/error, staging files, and other options. A simple job description file is shown below.
Listing 2. Sample job description XML file
<!--
Job to print the remote host environment
For a full set of WS-GRAM options see the WS_GRAM developers page at:
http://www.globus.org/toolkit/docs/4.0/execu-
tion/wsgram/user-index.html#s-wsgram-user-simplejob
-->
<job>
<executable>/usr/bin/env</executable>
<!-- Job stdout/err will be stored in the globus user home directory -->
<stdout>${GLOBUS_USER_HOME}/stdout</stdout>
<stderr>${GLOBUS_USER_HOME}/stderr</stderr>
<!-- Clean up std out/err from user's home -->
<fileCleanUp>
<deletion>
<file>file:///${GLOBUS_USER_HOME}/stdout</file>
</deletion>
<deletion>
<file>file:///${GLOBUS_USER_HOME}/stderr</file>
</deletion>
</fileCleanUp>
</job>
|
The job file from Listing 2 will print the remote host environment and store stdout/stderr in the user’s home directory (under the names stdout and stderr, respectively). After the job completes, the files will be deleted, as indicated by the fileCleanup section. If the cleanup section is commented out, subsequent calls to this job will have the side-effect of appending the output to the same file(s). Thus, it's recommended that you clean files after each job call.
Trick No. 2: What to do when receiving status changes
The callback stateChanged must be implemented to receive job status changes (see Listing 3).
Listing 3. Receiving status changes
/**
* Callback as a GramJobListener.
* Will not be called in batch mode.
*/
public void stateChanged(GramJob job) {
StateEnumeration jobState = job.getState();
boolean holding = job.isHolding();
printMessage("========== State Notification ==========");
printJobState(jobState, holding);
printMessage("========================================");
synchronized (this) {
if ( jobState.equals(StateEnumeration.Done)
|| jobState.equals(StateEnumeration.Failed)) {
printMessage("Exit Code: "
+ Integer.toString(job.getExitCode()));
this.jobCompleted = true;
}
notifyAll();
// if we a running an interactive job,
// prevent a hold from hanging the client
if ( holding && !batch) {
logger.debug(
"Automatically releasing hold for interactive job");
try {
job.release();
} catch (Exception e) {
String errorMessage = "Unable to release job from hold";
logger.debug(errorMessage, e);
printError(errorMessage + " - " + e.getMessage());
}
}
}
}
|
When a stateChanged callback is received, the state of the job must be checked and any waiting threads must be notified accordingly. The subroutine above prints the state of the job, notifies any waiting threads, and releases resources appropriately.
Trick No. 3: Job setup
In this step, the job arguments are set up properly. For example, if a job file is specified, a GramJob object is created with the file contents. The job timeout can also be set. Security parameters such as authorization and message protection must be defined. Authorization can be user, host (default), or identity. Message protection can be encryption or XML signature. Other options include delegation, duration, and termination times (see Listing 4). Once the parameters have been set, the job can be submitted.
Listing 4. Job parameters setup
/**
* Submit a WS-GRAM Job (GT4)
* @param factoryEndpoint Factory endpoint reference
* @param simpleJobCommandLine Executable (null to use a job file)
* @param rslFileJob XML file (null to use a command line)
* @param authorization Authorization: Host, Self, Identity
* @param xmlSecurity XML Sec: Encryption or signature
* @param batchMode Submission mode: batch will not wait for completion
* @param dryRunMode Used to parse RSL
* @param quiet Messages/NO messages
* @param duration Duration date
* @param terminationDate Termination date
* @param timeout Job timeout (ms)
*/
private void submitRSL(EndpointReferenceType factoryEndpoint,
String simpleJobCommandLine,
File rslFile,
Authorization authorization,
Integer xmlSecurity,
boolean batchMode,
boolean dryRunMode,
boolean quiet,
Date duration,
Date terminationDate,
int timeout)
throws Exception
{
this.quiet = quiet;
this.batch = batchMode || dryRunMode; // in single job only.
// In multi-job, -batch is not allowed. Dryrun is.
if (batchMode) {
printMessage("Warning: Will not wait for job completion, "
+ "and will not destroy job service.");
}
// create a job object with the XML spec or a simple command
if (rslFile != null) {
try {
this.job = new GramJob(rslFile);
} catch (Exception e) {
String errorMessage = "Unable to parse RSL from file "
+ rslFile;
logger.debug(errorMessage, e);
throw new IOException\
(errorMessage + " - " + e.getMessage());
}
}
else {
this.job = new GramJob(RSLHelper
.makeSimpleJob(simpleJobCommandLine));
}
// set job options
job.setTimeOut(timeout);
job.setAuthorization(authorization);
job.setMessageProtectionType(xmlSecurity);
job.setDelegationEnabled(this.delegationEnabled);
job.setDuration(duration);
job.setTerminationTime(terminationDate);
// submit here
this.processJob(job, factoryEndpoint, batch);
}
|
Trick No. 4: Job processing
With the job object created and its arguments in place, the job can be finally processed. Things to consider at this step:
- Load a valid proxy certificate -- You can use the default user proxy or create a custom proxy dynamically.
- Create a unique job identifier -- The Axis factoryorg.apache.axis.components.uuid.UUIDGenFactory can be used to generate a unique string identifier.
- If the job is sent in non batch mode, the parent class should add a listener for job status changes.
- The job must be submitted to the factory end point.
- If the submission mode is nonbatch, the client should wait for completion and destroy the job resources accordingly (see Listing 5).
Listing 5. Job processing
/**
* Submit the GRAM Job
* @param job Job object (GramJob)
* @param factoryEndpoint Factory end point reference
* @param batch If false will wait for job to complete
* @throws Exception
*/
private void processJob(GramJob job,
EndpointReferenceType factoryEndpoint,
boolean batch)
throws Exception
{
// load custom proxy (if any)
if (proxyPath != null) {
try {
ExtendedGSSManager manager = (ExtendedGSSManager)
ExtendedGSSManager.getInstance();
String handle = "X509_USER_PROXY=" + proxyPath.toString();
GSSCredential proxy = manager.createCredential(handle
.getBytes(),
ExtendedGSSCredential.\
IMPEXP_MECH_SPECIFIC,
GSSCredential.DEFAULT_LIFETIME, null,
GSSCredential.INITIATE_AND_ACCEPT);
job.setCredentials(proxy);
} catch (Exception e) {
logger.debug("Exception while obtaining user proxy: ", e);
printError("error obtaining user \
proxy: " + e.getMessage());
// don't exit, but resume using default proxy instead
}
}
// Generate a Job ID
UUIDGen uuidgen = UUIDGenFactory.getUUIDGen();
String submissionID = "uuid:" + uuidgen.nextUUID();
printMessage("Submission ID: " + submissionID);
if (!batch) {
job.addListener(this);
}
boolean submitted = false;
int tries = 0;
while (!submitted) {
tries++;
try {
job.submit(factoryEndpoint, batch, this.limitedDelegation,
submissionID);
submitted = true;
} catch (Exception e) {
logger.debug("Exception while \
submitting the job request: ", e);
throw new IOException("Job request error: " + e);
}
}
if (batch) {
printMessage("CREATED MANAGED JOB SERVICE WITH HANDLE:");
printMessage(job.getHandle());
}
if (logger.isDebugEnabled()) {
long millis = System.currentTimeMillis();
BigDecimal seconds = new BigDecimal(((double) millis) / 1000);
seconds = seconds.setScale(3, BigDecimal.ROUND_HALF_DOWN);
logger.debug("Submission time (secs) after: " + seconds.toString());
logger.debug("Submission time in milliseconds: " + millis);
}
if (!batch) {
printMessage("WAITING FOR JOB TO FINISH");
waitForJobCompletion(STATE_CHANGE_BASE_TIMEOUT_MILLIS);
try {
this.destroyJob(this.job); // TEST
} catch (Exception e) {
printError("could not destroy");
}
if (this.job.getState().equals(StateEnumeration.Failed)) {
printJobFault(this.job);
}
}
}
|
Trick No. 5: Waiting for the job to complete (nonbatch mode)
If the submission mode is nonbatch, the master thread must wait for the job to complete and return the status to the client. This is achieved by checking the status of the job and entering a loop to determine the cause of the job termination. Termination causes could be job faults (usually produced by a server side error), client timeouts, or unknown errors (see Listing 6).
Listing 6. Waiting for job completion
private synchronized void waitForJobCompletion(
long maxWaitPerStateNotificationMillis)
throws Exception
{
long durationToWait = maxWaitPerStateNotificationMillis;
long startTime;
StateEnumeration oldState = job.getState();
// prints one more state initially (Unsubmitted)
// but cost extra remote call for sure. Null test below instead
while (!this.jobCompleted)
{
if (logger.isDebugEnabled()) {
logger.debug("Job not completed - \
waiting for state change "
+ "(timeout before pulling: " + durationToWait
+ " ms).");
}
startTime = System.currentTimeMillis(); // (re)set start time
try {
wait(durationToWait); // wait for a state change notif
} catch (InterruptedException ie) {
String errorMessage = \
"interrupted thread waiting for job to finish";
logger.debug(errorMessage, ie);
printError(errorMessage); // no exiting...
}
// now let's determine what stopped the wait():
StateEnumeration currentState = job.getState();
// A) New job state change notification (good!)
if (currentState != null && !currentState.equals(oldState)) {
oldState = currentState; // wait for next state notif
durationToWait = maxWaitPerStateNotificationMillis; // reset
}
else
{
long now = System.currentTimeMillis();
long durationWaited = now - startTime;
// B) Timeout when waiting for a notification (bad)
if (durationWaited >= durationToWait) {
if (logger.isWarnEnabled()) {
logger.warn("Did not receive \
any new notification of "
+ "job state change \
after a delay of "
+ durationToWait + \
" ms.\nPulling job state.");
}
// pull state from remote job and print the
// state only if it is a new state
//refreshJobStatus();
job.refreshStatus();
// binary exponential backoff
durationToWait = 2 * durationToWait;
}
// C) Some other reason
else {
// wait but only for \
remainder of timeout duration
durationToWait = durationToWait - durationWaited;
}
}
}
}
|
Ready for a test?
Finally, everything is ready for a simple test. By setting a few arguments as shown below, you can test the custom WS-GRAM client. The output of the run is shown in Listing 8.
Listing 7. WS-GRAM client test
// remote host
String contact = "rtpmeta";
// Factory type: Fork, Condor, PBS, LSF
String factoryType = ManagedJobFactoryConstants.FACTORY_TYPE.FORK;
// Job XML
File rslFile = new File("/tmp/simple.xml");
// Default Security: Host authorization + XML encryption
Authorization authz = HostAuthorization.getInstance();
Integer xmlSecurity = Constants.ENCRYPTION;
// Submission mode: batch = will not wait
boolean batchMode = false;
// a Simple command executable (if no job file)
String simpleJobCommandLine = null;
// Job timeout values: duration, termination times
Date serviceDuration = null;
Date serviceTermination = null;
int timeout = GramJob.DEFAULT_TIMEOUT;
try {
GRAMClient gram = new GRAMClient();
gram.submitRSL(getFactoryEPR(contact,factoryType)
, simpleJobCommandLine, rslFile
, authz, xmlSecurity
, batchMode, false, false
, serviceDuration, serviceTermination, timeout );
} catch (Exception e) {
e.printStackTrace();
}
|
Listing 8. Output from the run in Listing 7
Submission ID: uuid:3eb57530-cfc4-11da-bb0b-dadfac0c5c05
WAITING FOR JOB TO FINISH
========== State Notification ==========
Job State: CleanUp
========================================
========== State Notification ==========
Job State: Active
========================================
========== State Notification ==========
Job State: Done
========================================
Exit Code: 0
DESTROYING JOB RESOURCE
JOB RESOURCE DESTROYED
|
Tips to troubleshoot and debug
Now here are some tips for retrieving job output, troubleshooting common errors, and debugging your code.
Tip No. 1: Retrieving job output
The WS-GRAM Java API is not capable of retrieving the job output at this moment. For example, if you run the command /bin/hostname on a remote host, you will get a unique identifier for that job, but the API is not capable of displaying the output of that command back to the console. The C-API, on the other hand, is capable of such a task. Consider the command below.
Listing 9. C-API command
globusrun-ws -submit -s
-factory https://rtpmeta:8443/wsrf/services/ManagedJobFactoryService
-f /tmp/simple.xml
Delegating user credentials...Done.
Submitting job...Done.
Job ID: uuid:272c8c7c-cfcf-11da-9061-00065ba4e6be
Termination time: 04/20/2006 18:05 GMT
Current job state: Active
Current job state: CleanUp-Hold
X509_USER_KEY=
X509_CERT_DIR=/etc/grid-security/certificates
LOGNAME=globus
GLOBUS_LOCATION=/opt/gt-4.0.1
HOME=/opt/home/globus
X509_USER_CERT=
GLOBUS_GRAM_JOB_HANDLE=https://134.67.144.3:8443/wsrf/services/
ManagedExecutableJobService?272c8c7c-cfcf-11da-9061-00065ba4e6be
JAVA_HOME=/usr/java/j2sdk1.4.2_10/jre
X509_USER_PROXY=
Current job state: CleanUp
Current job state: Done
Destroying job...Done.
Cleaning up any delegated credentials...Done.
|
The WS-GRAM C-client submits the job spec from Listing 2 and returns or streams the output back to the client console. The WS-GRAM Java API is not capable of such a task at this time.
It is unclear at this point Globus' rationale in not implementing Java output streaming back to the client on this version of the toolkit. Perhaps it is due to the lack of time or resources.
Tip No. 2: Troubleshooting
Let's look at the two most common errors when creating a custom WS-GRAM client, as well as the way to fix them.
Listing 10. First common error
No client transport named 'https' found!
at org.apache.axis.client.AxisClient.invoke(AxisClient.java:170)
at org.apache.axis.client.Call.invokeEngine(Call.java:2727)
at org.apache.axis.client.Call.invoke(Call.java:2710)
at org.apache.axis.client.Call.invoke(Call.java:2386)
at org.apache.axis.client.Call.invoke(Call.java:2309)
...
|
This error is caused by the Axis Web services client not being able to find a protocol definition for secure services HTTPS. To fix this error, simply add the following VM argument to your startup script:
-Daxis.ClientConfigFile=[GLOBUS_LOCATION]/client-config.wsdd
|
Listing 11. Second common error
org.apache.axis.ConfigurationException:
Configuration file directory './etc' does not exist or is not
a directory or is not readable.
at org.apache.axis.configuration.DirProvider.{init}(DirProvider.java:73)
at org.globus.wsrf.container.ServiceDispatcher.{init}(ServiceDispatcher.java:80)
at org.globus.wsrf.container.GSIServiceDispatcher.{init}(GSIServiceContainer.java:63)
at org.globus.wsrf.container.GSIServiceContainer.createServiceDispatcher
(GSIServiceContainer.java:49)
at org.globus.wsrf.container.ServiceContainer.start(ServiceContainer.java:226)
...
|
This error is caused when the WSRF code can't find the location of the GT4 configuration directories and other required components. To fix this error, simply add the VM argument:
-DGLOBUS_LOCATION=[GLOBUS_LOCATION]
|
Tip No. 3: Enable debugging
If you run into unknown errors you can’t figure out, try enabling debugging in your log4j.properties file, as shown below. The log4j.properties file should be in your classpath directory for it to take effect. Then you should see plenty of debug messages that can help you figure out what's going on.
Listing 12. Enabling debugging within log4j.properties
# WS-GRAM client
log4j.category.org.globus.exec.client=DEBUG
log4j.category.org.globus.exec.generated=DEBUG
# Custom WS-GRAM client
log4j.category.gram=DEBUG
|
Conclusion
GRAM services provide secure job submission to many types of job schedulers for users who have the right to access a job hosting resource in a grid environment. The existence of a valid proxy is required for job submission. All GRAM job submission options are supported transparently through the embedded request document input. In fact, the job startup is done by submitting a client-side provided job description to the GRAM services.
This article has been a WS-GRAM quick start to help you develop a client program that can be used within a Web service, in a Web application, or in your own custom job submission program.
Download | Description | Name | Size | Download method |
|---|
| Source code | gr-wsgramclient.java | 14KB | HTTP |
|---|
Resources Learn
Discuss
About the author  | |  | Vladimir Silva was born in Quito, Ecuador, where he earned an engineering degree from the Polytechnic Institute of the Army. He completed his master's degree in computer science at Middle Tennessee State University. After graduation, he joined the IBM WebAhead technology think tank, where he worked as a software engineer in projects such as the IBM internal grid and the IBM Grid Toolbox. He has published many grid-related technical articles for developerWorks, and some of his work on server-side security (digital certificates) has been incorporated into the latest release of the Globus Toolkit (GT4). He is the author of Grid Computing for Developers. Other interests include neural nets and artificial intelligence. He holds numerous IT certifications including OCP, MCSD, and MCP. |
Rate this page
|  |