Deploy and consume a web service
There is a large collection of bioinformatics applications, based on well-established algorithms, that are in widespread use. Most of these applications are invoked through command-line input, and both the input and output are plain text files. BLAST, ClustalW, and GeneScan are all such applications.
The IBM Web Services for Life Sciences (WS4LS) project (in Java language) and our Grid-enabled services (in Perl) are both wrappers around existing executables. Eventually, these applications may be reimplemented with a high throughput web services framework in mind, but until those are generated, wrapper services are used.
Assuming that you have another generic application that accepts a text file as input and produces a text file as a result, the following steps describe how to take this application and turn it into a high throughput web service that is invokable through BioPerl. To deploy such a service:
- Choose a document structure to contain the input and output documents.
- Create a WSDL.
- Write a wrapper class around the executable program.
- Deploy the wrapper class.
First, you must decide on the document that is the input to the service. One approach that the National Center for Biotechnology Information (NCBI) uses with their BLAST document is to separate out the BLAST parameters from the program parameters, as show in Listing 1:
Listing 1: First approach
<BlastOutput_db>nr</BlastOutput_db>
<BlastOutput_query-ID>lcl|1</BlastOutput_query-ID>
<BlastOutput_query-def>pA262 </BlastOutput_query-def>
<BlastOutput_query-len>1091</BlastOutput_query-len>
<BlastOutput_param>
<Parameters>
<Parameters_matrix>BLOSUM62</Parameters_matrix>
<Parameters_expect>10</Parameters_expect>
|
Another approach is to include all the parameters at the same level in the documents, as shown in Listing 2:
Listing 2: Second approach
<BlastOutput_db>nr</BlastOutput_db> <Parameters_matrix>BLOSUM62</Parameters_matrix> |
The easiest solution is simply to represent the input file as a literal string in the XML document. If you need a FASTA file and you put the entire FASTA file between two tags, then you will not need to be concerned with representing the gene structure within the XML message.
The same philosophy of wrapping up an entire plain-text document between two tags also applies to a response document. For example, the ClustalW service produces 3 outputs. It prints text to the console, writes an ALN file, and writes a DND file. Thus, the WS4LS solution makes 3 tags (message, aln, dnd) and produces the entire output as a string between these tags. These three tags (and a few others), are encapsulated into one response element. Therefore, it is possible to build an output document containing several runs of the program.
For example, Listing 3 shows some small FASTA files embedded in a document:
Listing 3
<BlastOutput_query-seq> >file-1 ACGTCGTTGGGTT >file-2 ACGTTTT </BlastOutput_query-seq> |
The benefit of this approach is that the service consumer would otherwise have to reconstruct the original output. Another benefit of this approach is that when running the executable, various output documents can be quickly constructed and then concentrated together to construct a single large SOAP message.
Once you have defined the input and output messages, write a WSDL for the service with the XSD schema embedded under the <types> tag.
Since there is little to no WSDL generation for document-style services, you might have to write some of this WSDL by hand. One shortcut is to use existing DTDs and a DTD2XSD tool to build the types. We have taken this shortcut by using DTDs provided by the NCBI. Another shortcut is to use the <xsd:any> tag, so any document you send will be valid.
Building a wrapper around an existing executable is rather straightforward. The easiest way to do this is simply to extract the parameters from the input XML document and build a string that is a system call.
However, there is a shortcut that allows you to parse the input document and perform the system call in one step. The shortcut is to write an XSL stylesheet that transforms the input document into a shell script. As an example, the stylesheet shown in Listing 4 transforms the input BLAST request document into a Perl script:
Listing 4
<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
#!perl -w
use Cwd;
my $FASTA_TEXT = "<xsl:value-of select="//BlastOutput_query-seq"/>";
my $OUTPUT = cwd."/output.txt";
my $INPUT = cwd."/input.FASTA";
open (INFO, ">$INPUT");
print INFO $FASTA_TEXT;
close INFO;
<xsl:apply-templates select="BlastOutput"/>
my $COMMAND="./blastall -p $P -d $DB -i $INPUT -m 7 -o $OUTPUT";
system("$COMMAND");
</xsl:template>
<xsl:template match="BlastOutput">
my $P="<xsl:value-of select="./BlastOutput_program"/>";
my $DB="<xsl:value-of select="./BlastOutput_db"/>";
</xsl:template>
</xsl:stylesheet>
|
This allows you to use a single wrapper for multiple applications. The application invoked can be changed by swapping out the stylesheet. There are several tools for building web services that may aid you, such as the lsutil.jar package from the WS4LS project, as well as the IBM Emerging Technologies Toolkit (ETTK).
Deploying an existing class as a web service has never been easier. Take the pick from various tools including IBM® WebSphere® Application Server, Apache AXIS, SOAP::Lite, and various others.
Consume a Service from BioPerl
The process of extending BioPerl to consume a document-style service requires planning so that the implementation goes smoothly. BioPerl is capable of creating and parsing many different types of text files. The web service is sending and receiving XML documents. At a high level, you are simply using XSLT to change these text files into xml files and vise versa.
The specific steps are:
- Add the WSDL to the config file.
- Read-in the new entry to the config file.
- Create a method that handles parameters specific to the web service, such as the username and password.
- Create a BioPerl class that produces the request XML.
A good approach is to work backwards from the web service invocation by creating a dummy request to test the service. After calling the web service, results are submitted into BioPerl. Finally, BioPerl is extended to produce the XML request.
Edit the BioWS config.xml file
This file contains the information specific to the web service. To make the conversion of text files to XML files easier, specify a path to the stylesheets.
Create a new element under the <BioWS> element and put the location of the WSDL in this element. If the web service is secured with basic authentication, put the username and password in the file as well. For example, see Listing 5:
Listing 5
<?xml version="1.0"?> <BioWS> <user>Daniel</user> <password>danpass</password> <requestXSL>0</requestXSL> <responseXSL>0</responseXSL> <myWSDL>http://localhost/myservice.wsdl</myWSDL> </BioWS> |
Change the BWSconfig.pm file to recognize the new tag
The BWSconfig.pm file reads-in the BioWS_config.xml file from the current working directory and parses it with DOM. A line should be added to read the new tag, as in the example in Listing 6:
Listing 6
my $myWSDL = $doc->getElementsByTagName("myWSDL")
->item(0)->getFirstChild->getNodeValue;
|
The process of calling a web service can broken up into two methods: a "write" method to get the information specific to the service and a generic "invoke" method that makes the call. The WSDL location and the username/password are both information specific to the service. The method might look like the one in Listing 7:
Listing 7
sub invokeMyService{
my %CONFIG = BioWS::Config->getHash();
my $wsdl = find_WSDL($CONFIG{"myWSDL"});
my $user = $CONFIG{"user"};
my $pass = $CONFIG{"pass"};
my $response = invoke($xml, $wsdl, $user, $pass);
my $bioperl_object = resp2BioPl($response );
return $bioperl_object;
}
|
The invoke method uses the WSDL2Perl code generation routine to create an executable stub from the WSDL. The find_WSDL method is a layer of abstraction in case you want to use a service discovery mechanism instead of simply reading the WSDL from a configuration file. The invoke method transforms the request documents (the argument), invokes a web service, and transforms the response.
The result of the invoke method should be the relevant data extracted from the service response. The next step is to implement a "response to BioPerl" method (for example, resp2BioPl) that uses a BioPerl function to read-in the response.
Create a method that instantiates a BioPerl object
Next, the results of the XSLT transformation into BioPerl need to be read. In most cases the results of the XSLT are in a text document that BioPerl knows how to parse. BioPerl has several input mechanisms available such as SeqIO, SearchIO, and AlignIO packages. All such packages work by constructing an IO object with a certain file format type and sending the results file to the IO object.
The final step is to decide on the BioPerl method signature. For established applications such as BLAST and ClustalW, simply copy the existing method signature that so BioPerl has to change as little as possible. For example, we replace
Bio::Tools::Run::StandAloneBlast
with
Bio::Tools::Run::WebServiceBlast
The purpose of this class is only to arrange the parameters into an XML document. As mentioned earlier, there are three ways to build the XML message. If the WSDL includes a complete schema, WSDL2Perl can generate serializable objects for you.
Listing 8: GridBlast.pm is code generated from GridBlast.wsdl
my $service = GridBlast->new();
my $soapService = $service->getGridBlastSoap($HOST); #HOST is the endpoint
my %input;
$input{"messageBody"} ="<xml> ..."; #the input doc goes here
my $s_result = $soapService->start_blast(\%input);
|
If WSDL-generated objects are not used, then it is possible to code the tag names as a class as shown in Listing 9:
Listing 9
my $XML = <<EOF; <Bioseq-set_id> $id </Bioseq-set_id> <Bioseq-set_coll> $coll </Bioseq-set_coll> <Bioseq-set_level> $level </Bioseq-set_level> EOF |
The only other alternative is to use the DOM objects to create the XML message. This is considerably more code to write, but might be useful as a simple schema or for changing the name and order of elements at run-time. Once you have the XML input, simply pass XML to the InvokeX method and return the BioPerl Object as a result.
In this article we have presented a step-by-step detail of the process of developing and deploying a web service. We discussed how to select a DTD/document structure to contain the input and output data, create a WSDL, write a wrapper class around the executable program, and deploy the wrapper class.
This paper describes the joint work of the Extreme Blue team Summer 2003, Fungal Genomics Lab at NC State University and the North Carolina Biogrid. Our team has set up a framework for deploying bioinformatics applications as high-throughput Web Services on the North Carolina BioGrid. The intern team consists of: Mine Altunay (maltuna@unity.ncsu.edu), Daniel Colonnese (dcolonn@ncsu.edu), Chetna Warade (warade@us.ibm.com), and Lindsay Wilber (WilberL04@darden.virginia.edu). The team was advised by members of the IBM Life Sciences Group, including Virinder Batra (batra@us.ibm.com), Madhu Gombar (mgombar@us.ibm.com), Rick Runyan (runyan@us.ibm.com), Prasad Vadlamudi (prasadv@us.ibm.com) and Doug Brown (debrown@unity.ncsu.edu).
- Get more information on Apache AXIS.
- Read Part 1 and Part 2 of the "Web services for bioinformatics" series (developerWorks, May 2004).
- Read over the BioPerl 1.2 Module Documentation.
- Check out the JAX-RPC Specification v1.0.
- Visit the NC BioGrid.
- Take a look at a "Web Service for Bioinformatic Analysis Workflow on alphaWorks.
- See a demo or download the "Bioinformatic Workflow Builder Interface on alphaWorks.
- Read the article Web Services for Life Sciences, which has an example set of web services that offers standard bioinformatics applications and demonstrates the technology (alphaWorks, February 2003).
- Download the latest version of the ETTK from alphaWorks.
- Find the data compression library, zlib Canonical, at the zlib homepage.
- Download the Globus Toolkit from Globus.org.
- Find out what the Globus Commodity Grid Kits are all about.
- Store your Grid credentials in the MyProxy Online Credential Repository.
- Find the MyProxy related software at the Partnership for Advanced Computational Infrastructure web site and the NSF Middleware Initiative site.
- Read the article "Reap the benefits of the document-style web services" (developerWorks, June 2002).
- Browse through the PDF by I. Foster, C. Kesselman, G. Tsudik, and S. Tuecke, "A Security Architecture for Computational Grids." In Proceedings of the 5th ACM Conference on Computer and Communications Security, pages 83-92, November 1998.
- Check out the Globus Security Infrastructure (GSI).
Mine Altunay: Mine is currently pursuing her PhD at the Computer Engineering Department of North Carolina State University. Her studies focus on grid computing and workflow management in OGSA, with a strong emphasis on authorization and trust management issues. She is also a member of the Fungal Genomics Laboratory, where she has worked on several bioinformatics projects, as well as the establishment and integration of their computational and data grids with North Carolina BioGrid. You can contact Mine at maltuna@unity.ncsu.edu.
Daniel Colonnese: Daniel has recently completed his master’s degree in computer science from NC State University. He has worked on a number of projects in ecommerce, life sciences, and grid computing. His interests include software reliability and service-oriented architectures. He will be joining Lotus/Portal technical sales in June 2004. You can contact Daniel at dcolonn@ncsu.edu.
Chetna Warade: Since 1999, Chetna has worked on a wide range of projects varying from systems programming to bioinformatics. She has a strong interest and aptitude in software architecture and development, systems programming, and various emerging technologies such as web services, life sciences, and the new breed of Internet technologies. You can contact Chetna at warade@us.ibm.com.