Skip to main content

If you don't have an IBM ID and password, register here.

By clicking Submit, you agree to the developerWorks terms of use.

The first time you sign into developerWorks, a profile is created for you. This profile includes the first name, last name, and display name you identified when you registered with developerWorks. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

All information submitted is secure.

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerworks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

By clicking Submit, you agree to the developerWorks terms of use.

All information submitted is secure.

Web services for bioinformatics, Part 3

Deploying and consuming bioinformatics web services

Mine Altunay (maltuna@unity.ncsu.edu), Student, North Carolina State University
Mine Altunay: Mine is currently pursuing her PhD at the Computer Engineering Department of North Carolina State University. Her studies focus on grid computing and workflow management in OGSA, with a strong emphasis on authorization and trust management issues. She is also a member of the Fungal Genomics Laboratory, where she has worked on several bioinformatics projects, as well as the establishment and integration of their computational and data grids with North Carolina BioGrid. You can contact Mine at maltuna@unity.ncsu.edu.
Daniel Colonnese (dcolonn@ncsu.edu), Student, North Carolina State University
Daniel Colonnese: Daniel has recently completed his master’s degree in computer science from NC State University. He has worked on a number of projects in ecommerce, life sciences, and grid computing. His interests include software reliability and service-oriented architectures. He will be joining Lotus/Portal technical sales in June 2004. You can contact Daniel at dcolonn@ncsu.edu.
Chetna Warade (warade@us.ibm.com), Developer, IBM Healthcare & Life Sciences
Chetna Warade: Since 1999, Chetna has worked on a wide range of projects varying from systems programming to bioinformatics. She has a strong interest and aptitude in software architecture and development, systems programming, and various emerging technologies such as web services, life sciences, and the new breed of Internet technologies. You can contact Chetna at warade@us.ibm.com.

Summary:  This article describes the process of deploying and consuming high-throughput web services for bioinformatics applications. It provides directions for deploying a BLAST application web service and consuming a BLAST web service from BioPerl.

Date:  08 Jun 2004
Level:  Intermediate

Comments:  

Deploy and consume a web service

There is a large collection of bioinformatics applications, based on well-established algorithms, that are in widespread use. Most of these applications are invoked through command-line input, and both the input and output are plain text files. BLAST, ClustalW, and GeneScan are all such applications.

The IBM Web Services for Life Sciences (WS4LS) project (in Java language) and our Grid-enabled services (in Perl) are both wrappers around existing executables. Eventually, these applications may be reimplemented with a high throughput web services framework in mind, but until those are generated, wrapper services are used.

Deploy a web service

Assuming that you have another generic application that accepts a text file as input and produces a text file as a result, the following steps describe how to take this application and turn it into a high throughput web service that is invokable through BioPerl. To deploy such a service:

  1. Choose a document structure to contain the input and output documents.
  2. Create a WSDL.
  3. Write a wrapper class around the executable program.
  4. Deploy the wrapper class.

Choose a request message

First, you must decide on the document that is the input to the service. One approach that the National Center for Biotechnology Information (NCBI) uses with their BLAST document is to separate out the BLAST parameters from the program parameters, as show in Listing 1:


Listing 1: First approach
  <BlastOutput_db>nr</BlastOutput_db>
  <BlastOutput_query-ID>lcl|1</BlastOutput_query-ID>
  <BlastOutput_query-def>pA262 </BlastOutput_query-def>
  <BlastOutput_query-len>1091</BlastOutput_query-len>

  <BlastOutput_param>
    <Parameters>
      <Parameters_matrix>BLOSUM62</Parameters_matrix>
      <Parameters_expect>10</Parameters_expect>

   

Another approach is to include all the parameters at the same level in the documents, as shown in Listing 2:


Listing 2: Second approach
  <BlastOutput_db>nr</BlastOutput_db>
  <Parameters_matrix>BLOSUM62</Parameters_matrix>

The easiest solution is simply to represent the input file as a literal string in the XML document. If you need a FASTA file and you put the entire FASTA file between two tags, then you will not need to be concerned with representing the gene structure within the XML message.

Choose a response message

The same philosophy of wrapping up an entire plain-text document between two tags also applies to a response document. For example, the ClustalW service produces 3 outputs. It prints text to the console, writes an ALN file, and writes a DND file. Thus, the WS4LS solution makes 3 tags (message, aln, dnd) and produces the entire output as a string between these tags. These three tags (and a few others), are encapsulated into one response element. Therefore, it is possible to build an output document containing several runs of the program.

For example, Listing 3 shows some small FASTA files embedded in a document:


Listing 3
<BlastOutput_query-seq>

>file-1
ACGTCGTTGGGTT

>file-2
ACGTTTT

</BlastOutput_query-seq>
   

The benefit of this approach is that the service consumer would otherwise have to reconstruct the original output. Another benefit of this approach is that when running the executable, various output documents can be quickly constructed and then concentrated together to construct a single large SOAP message.

Create a WSDL

Once you have defined the input and output messages, write a WSDL for the service with the XSD schema embedded under the <types> tag.

Since there is little to no WSDL generation for document-style services, you might have to write some of this WSDL by hand. One shortcut is to use existing DTDs and a DTD2XSD tool to build the types. We have taken this shortcut by using DTDs provided by the NCBI. Another shortcut is to use the <xsd:any> tag, so any document you send will be valid.

Create a wrapper class

Building a wrapper around an existing executable is rather straightforward. The easiest way to do this is simply to extract the parameters from the input XML document and build a string that is a system call.

However, there is a shortcut that allows you to parse the input document and perform the system call in one step. The shortcut is to write an XSL stylesheet that transforms the input document into a shell script. As an example, the stylesheet shown in Listing 4 transforms the input BLAST request document into a Perl script:


Listing 4
<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:template match="/">
#!perl -w
use Cwd;
my $FASTA_TEXT = "<xsl:value-of select="//BlastOutput_query-seq"/>";
my $OUTPUT = cwd."/output.txt";
my $INPUT = cwd."/input.FASTA";
open (INFO, ">$INPUT");
print INFO $FASTA_TEXT;
close INFO;
   
<xsl:apply-templates select="BlastOutput"/>

my $COMMAND="./blastall -p $P -d $DB -i $INPUT -m 7 -o $OUTPUT";
system("$COMMAND");
</xsl:template>
<xsl:template match="BlastOutput">
my $P="<xsl:value-of select="./BlastOutput_program"/>";
my $DB="<xsl:value-of select="./BlastOutput_db"/>";
</xsl:template>
</xsl:stylesheet>

This allows you to use a single wrapper for multiple applications. The application invoked can be changed by swapping out the stylesheet. There are several tools for building web services that may aid you, such as the lsutil.jar package from the WS4LS project, as well as the IBM Emerging Technologies Toolkit (ETTK).

Create a wrapper class

Deploying an existing class as a web service has never been easier. Take the pick from various tools including IBM® WebSphere® Application Server, Apache AXIS, SOAP::Lite, and various others.


Consume a Service from BioPerl

The process of extending BioPerl to consume a document-style service requires planning so that the implementation goes smoothly. BioPerl is capable of creating and parsing many different types of text files. The web service is sending and receiving XML documents. At a high level, you are simply using XSLT to change these text files into xml files and vise versa.

The specific steps are:

  1. Add the WSDL to the config file.
  2. Read-in the new entry to the config file.
  3. Create a method that handles parameters specific to the web service, such as the username and password.
  4. Create a BioPerl class that produces the request XML.

A good approach is to work backwards from the web service invocation by creating a dummy request to test the service. After calling the web service, results are submitted into BioPerl. Finally, BioPerl is extended to produce the XML request.

Edit the BioWS config.xml file

This file contains the information specific to the web service. To make the conversion of text files to XML files easier, specify a path to the stylesheets.

Create a new element under the <BioWS> element and put the location of the WSDL in this element. If the web service is secured with basic authentication, put the username and password in the file as well. For example, see Listing 5:


Listing 5
<?xml version="1.0"?>
<BioWS>
<user>Daniel</user>
<password>danpass</password>
<requestXSL>0</requestXSL>
<responseXSL>0</responseXSL>
<myWSDL>http://localhost/myservice.wsdl</myWSDL>
</BioWS>

Change the BWSconfig.pm file to recognize the new tag

The BWSconfig.pm file reads-in the BioWS_config.xml file from the current working directory and parses it with DOM. A line should be added to read the new tag, as in the example in Listing 6:


Listing 6
my $myWSDL = $doc->getElementsByTagName("myWSDL")
                                          ->item(0)->getFirstChild->getNodeValue;

Add an invoke method

The process of calling a web service can broken up into two methods: a "write" method to get the information specific to the service and a generic "invoke" method that makes the call. The WSDL location and the username/password are both information specific to the service. The method might look like the one in Listing 7:


Listing 7
sub invokeMyService{
 
    my %CONFIG = BioWS::Config->getHash(); 
    my $wsdl = find_WSDL($CONFIG{"myWSDL"});
    my $user = $CONFIG{"user"}; 
    my $pass = $CONFIG{"pass"}; 
    my $response = invoke($xml, $wsdl, $user, $pass);
    my $bioperl_object = resp2BioPl($response );
    return $bioperl_object;
}

The invoke method uses the WSDL2Perl code generation routine to create an executable stub from the WSDL. The find_WSDL method is a layer of abstraction in case you want to use a service discovery mechanism instead of simply reading the WSDL from a configuration file. The invoke method transforms the request documents (the argument), invokes a web service, and transforms the response.

The result of the invoke method should be the relevant data extracted from the service response. The next step is to implement a "response to BioPerl" method (for example, resp2BioPl) that uses a BioPerl function to read-in the response.

Create a method that instantiates a BioPerl object

Next, the results of the XSLT transformation into BioPerl need to be read. In most cases the results of the XSLT are in a text document that BioPerl knows how to parse. BioPerl has several input mechanisms available such as SeqIO, SearchIO, and AlignIO packages. All such packages work by constructing an IO object with a certain file format type and sending the results file to the IO object.

Create a BioPerl class

The final step is to decide on the BioPerl method signature. For established applications such as BLAST and ClustalW, simply copy the existing method signature that so BioPerl has to change as little as possible. For example, we replace

Bio::Tools::Run::StandAloneBlast

with

Bio::Tools::Run::WebServiceBlast

The purpose of this class is only to arrange the parameters into an XML document. As mentioned earlier, there are three ways to build the XML message. If the WSDL includes a complete schema, WSDL2Perl can generate serializable objects for you.


Listing 8: GridBlast.pm is code generated from GridBlast.wsdl
my $service = GridBlast->new();
my $soapService = $service->getGridBlastSoap($HOST); #HOST is the endpoint
my %input;
$input{"messageBody"} ="<xml> ..."; #the input doc goes here
my $s_result = $soapService->start_blast(\%input);

If WSDL-generated objects are not used, then it is possible to code the tag names as a class as shown in Listing 9:


Listing 9

my $XML = <<EOF;

<Bioseq-set_id> $id </Bioseq-set_id> 
<Bioseq-set_coll> $coll </Bioseq-set_coll> 
<Bioseq-set_level> $level </Bioseq-set_level>

EOF

The only other alternative is to use the DOM objects to create the XML message. This is considerably more code to write, but might be useful as a simple schema or for changing the name and order of elements at run-time. Once you have the XML input, simply pass XML to the InvokeX method and return the BioPerl Object as a result.

Summary

In this article we have presented a step-by-step detail of the process of developing and deploying a web service. We discussed how to select a DTD/document structure to contain the input and output data, create a WSDL, write a wrapper class around the executable program, and deploy the wrapper class.

Acknowledgements

This paper describes the joint work of the Extreme Blue team Summer 2003, Fungal Genomics Lab at NC State University and the North Carolina Biogrid. Our team has set up a framework for deploying bioinformatics applications as high-throughput Web Services on the North Carolina BioGrid. The intern team consists of: Mine Altunay (maltuna@unity.ncsu.edu), Daniel Colonnese (dcolonn@ncsu.edu), Chetna Warade (warade@us.ibm.com), and Lindsay Wilber (WilberL04@darden.virginia.edu). The team was advised by members of the IBM Life Sciences Group, including Virinder Batra (batra@us.ibm.com), Madhu Gombar (mgombar@us.ibm.com), Rick Runyan (runyan@us.ibm.com), Prasad Vadlamudi (prasadv@us.ibm.com) and Doug Brown (debrown@unity.ncsu.edu).


Resources

About the authors

Mine Altunay: Mine is currently pursuing her PhD at the Computer Engineering Department of North Carolina State University. Her studies focus on grid computing and workflow management in OGSA, with a strong emphasis on authorization and trust management issues. She is also a member of the Fungal Genomics Laboratory, where she has worked on several bioinformatics projects, as well as the establishment and integration of their computational and data grids with North Carolina BioGrid. You can contact Mine at maltuna@unity.ncsu.edu.

Daniel Colonnese: Daniel has recently completed his master’s degree in computer science from NC State University. He has worked on a number of projects in ecommerce, life sciences, and grid computing. His interests include software reliability and service-oriented architectures. He will be joining Lotus/Portal technical sales in June 2004. You can contact Daniel at dcolonn@ncsu.edu.

Chetna Warade: Since 1999, Chetna has worked on a wide range of projects varying from systems programming to bioinformatics. She has a strong interest and aptitude in software architecture and development, systems programming, and various emerging technologies such as web services, life sciences, and the new breed of Internet technologies. You can contact Chetna at warade@us.ibm.com.

Report abuse help

Report abuse

Thank you. This entry has been flagged for moderator attention.


Report abuse help

Report abuse

Report abuse submission failed. Please try again later.


developerWorks: Sign in

If you don't have an IBM ID and password, register here.


Forgot your IBM ID?


Forgot your password?
Change your password


By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. This profile includes the first name, last name, and display name you identified when you registered with developerWorks. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

Choose your display name

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

(Must be between 3 – 31 characters.)


By clicking Submit, you agree to the developerWorks terms of use.

 


Rate this article

Comments

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=SOA and web services
ArticleID=11928
ArticleTitle=Web services for bioinformatics, Part 3
publish-date=06082004
author1-email=maltuna@unity.ncsu.edu
author1-email-cc=
author2-email=dcolonn@ncsu.edu
author2-email-cc=
author3-email=warade@us.ibm.com
author3-email-cc=

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

For articles in technology zones (such as Java technology, Linux, Open source, XML), Popular tags shows the top tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), Popular tags shows the top tags for just that product zone.

For articles in technology zones (such as Java technology, Linux, Open source, XML), My tags shows your tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), My tags shows your tags for just that product zone.

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).