Skip to main content

If you don't have an IBM ID and password, register here.

By clicking Submit, you agree to the developerWorks terms of use.

The first time you sign into developerWorks, a profile is created for you. This profile includes the first name, last name, and display name you identified when you registered with developerWorks. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

All information submitted is secure.

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerworks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

By clicking Submit, you agree to the developerWorks terms of use.

All information submitted is secure.

Squeezing SOAP

GZIP enabling Apache Axis

Brian Goodman (bgoodman@us.ibm.com), IT Architect, IBM Intranet Technology, IBM
Brian D. Goodman wants what he wants when he wants it and is willing to do whatever he needs to do to get it. You can contact Brian at bgoodman@us.ibm.com. Questions, comments and suggestions are always welcome.

Summary:  GZIP encoding over HTTP is pretty much old school. "Been there, done that" is the attitude of most. However, if you have been working with a few of the current SOAP implementations, you'll find that they don't take advantage of it. While knowing they will eventually come around, if you are building real world Web service solutions and want a performance boost, GZIP is for you.

Date:  01 Mar 2003
Level:  Introductory

Comments:  

GZIP encoding over an HTTP transport is a well known technique for improving the performance of Web applications. High traffic Web sites use GZIP to make their user's experience faster, and compression is widely used to make files smaller for download and exchange. In fact, as far as XML goes, GZIP is not even the latest, cool thing to be doing. New technology, like ATT's XMill, claims twice the compression of GZIP in roughly the same amount of time. GZIP, however, is a core component of the Java platform and many Web servers have the ability to compress content independent of the files or applications it serves up. For that reason, this article will look at what it takes to use GZIP in conjunction with the Axis SOAP implementation. This has proven useful for projects and solutions that need the extra performance now and for which you are willing to sacrifice time spent integrating with follow on releases of SOAP implementations later. Furthermore, this article looks at encoding at the servlet level, which will enable you to implement a different content encoding scheme.

Getting started

To perform any of these changes you must have the source code to Apache Axis. But you should expect that this modification can be done similarly for different SOAP implementations. I will be modifying the underlying HTTPSender and AxisServlet code. The servlet is not necessary if your server handles compression for you; however, in modifying the servlet, the framework is in place for you to drop in other encoding solutions.

The goal is to add GZIP compression without having to functionally change the underlying SOAP implementation. This is relatively easy because the java.util.zip is stream based. Additionally, the underlying code for making HTTP connections in Axis and others solutions is also stream based.


Quick overview of Java technology and GZIP

In this article, I will not cover the details of using the java.util.zip package. Sun hosts relevant content and it has been documented in many places (See Resources).

The java.util.zip package provides everything you need to encode other streams. Specifically, you will be working with the GZIPInputStream and GZIPOutputStream objects. Both objects take streams as arguments and as long as you can check the content encoding, you can wrapper the stream appropriately. The magic of compression and decompression is handled for you. Obviously, if you wanted to encode and decode in some other format, I would suggest extending the input and output stream.

From the server side, there are two ways to deal with the content encoding. First, and preferably, the Web server fronting the application server should handle the compression. You can configure the Web server to identify clients that can support compressed content encoding and then encode the output content being requested in that format. It does this dynamically and for the most part, this adds a trivial amount of overhead when running on today's hardware. The second method available is to encode the content within the application itself. This method is probably more expensive than the first in terms of performance, amount of code required, and necessary maintenance. You can easily use a CGI or servlet to determine the accepted encoding types and in turn, send the appropriately encoded response. This is useful if you have an encoding type that is not common among Web servers. In the end, the burden is shifting from the network and to the server and the client. It is, however, arguable that this burden is less than the performance gained by transferring about half as much data across the wire.

The goal is to achieve transparency; intercept the input and output streams and funnel them through the appropriate GZIP stream. It's simple enough, so I'll get started!


Attacking the server: Modifying AxisServlet

This section covers how to modify the AxisServlet to enable compression. You will want to use this method if you don't have access to the HTTP server configuration or if the server cannot support compression. This technique can easily be the starting point for encoding with other styles.

Encoding at the application layer, instead of the Web server layer, is simple enough. This method is particularly useful if you plan on using an encoding scheme that is not supported by the popular servers/modules. As with all of the code modifications in this article, these changes can be implemented in different ways. For example, you might extend AxisServlet and implement an HTTPServletResponse to intercept the outgoing output stream.

First, modify the doPost method in order to test the Accept-Encoding header. I am going to trust the client if it indicates that it accepts GZIP encoding. Check the user-agent header to make sure that compression is done only for clients that can handle it. But, the concern is minimal because most browsers don't hit RPCRouters. If GZIP encoding is accepted I'll set the Boolean, supports Gzip, to true. The code can be reviewed in Listing 1.


Listing 1. AxisServlet(doPost)
		
String encoding = req.getHeader("Accept-Encoding"); 
        
if (encoding != null) {
if (encoding.toLowerCase().indexOf("gzip") > -1){supportsGzip = true;}
}

Secondly, navigate to the sendResponse method. I will add some code to check if the current connection supports GZIP. If it does, I will set the response header Content-Encoding to gzip and wrapper the response output stream with a GZIP output stream. From that point everything is similar except you will need to flush the output stream or it is possible that the last bit of the stream buffer will not be sent. This modification provides GZIP encoding regardless of the underlying Web service being called. It should be pretty straightforward to enable specialized content encoding. See the code in Listing 2.


Listing 2. AxisServlet(sendResponse)
		
if (supportsGzip == true) {


// Set the response header.
// Note that response is an HttpResponse instance.

        res.setHeader("Content-Encoding", "gzip");
    
        GZIPOutputStream gzos = new GZIPOutputStream(res.getOutputStream());
            
        responseMsg.writeTo(gzos);
        gzos.flush();
        gzos.close();
            
 	//gzip code end 
	             
} else {
            	
        res.setContentType(contentType);

        /* My understand of Content-Length
         * HTTP 1.0
         *   -Required for requests, but optional for responses.
         * HTTP 1.1
         *  - Either Content-Length or HTTP Chunking is required.
         *   Most servlet engines will do chunking 
             if content-length is not specified.
         */
                
         responseMsg.writeTo(res.getOutputStream());         
}


Attacking the client: Modifying HTTPSender

One thing you will notice if you are unfamiliar with the Axis source code is that it's not like the developers just extended someone else's HTTP client object. They are actually doing their own socket-level implementation of an HTTP client. I'm sure there are lots of reasons for doing this, but an obvious one is that it makes changes like the one I am about to make trivial. The two methods I am interested in are writeToSocket and readFromSocket.

Hello world, I accept GZIP!

The first thing I need to do is modify the client to indicate to the server that it can handle compression. The caveat here is that not all Web clients can handle the encodings that they advertise. If you are working with a server where you cannot verify the encoding rules the server is referencing, I suggest that you test on a servlet that encodes at the application level. It is entirely possible that the server is set up to match specific user clients, ones that I am not going to try to duplicate. If you have access, make sure that the server is encoding for your user agent. A brief look at the logs or a SOAP trace can probably tell you that the user agent header is not being set. To make life easier it should be. However, I leave this up to you.

I will set the header to Accept-Encoding. This HTTP header stores a string value and is often a list of values. See Listing 3. I am using the gzip value. So, given that I want to add a header, I need to navigate in the HTTPSender code. Headers are dealt with here. You can see from a quick read that there is a framework for adding headers, but right now, Accept-Encoding is not a header the developers deal with. For this article I will just hard code the encoding. How you integrate or set the HTTP header is entirely up to you. There are many ways to tackle this problem. I insert the code snippet in Listing 4. After they cast, they pull the request headers and cast them into a hash. I am adding one header that will trigger compression on the server side, whether that is handled by the Web server or the application.


Listing 3. Sample HTTP GET from Internet Explorer to www.ibm.com.
		
GET / HTTP/1.1
Accept: */*
Accept-Language: en-us
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.0.3705)
Host: www.ibm.com:80
Connection: Keep-Alive  



Listing 4. Code snippet from HTTPSender.java (writeToSocket)
		
//process user defined headers for information.
Hashtable userHeaderTable = 
(Hashtable) msgContext.getProperty(HTTPConstants.REQUEST_HEADERS);
                
//Adding Accept-Encoding header to the hashtable.
if(userHeaderTable == null){userHeaderTable = 
new Hashtable();} userHeaderTable.put("Accept-Encoding", "gzip");

Getting what you ask for!

Having modified the HTTP request to include the Accept-Encoding header with a value of GZIP, I need to modify the method used to read that content. I also need to check the HTTP header content-encoding to see if the server responded with GZIP. If it did, wrapper the input stream with a GZIP input stream. If on the other hand, GZIP encoding is not detected, we leave the input stream alone. Placement of this code is important because you don't want to disturb the current process that the Axis developer is using. We want to modify the input stream just before the body of the HTTP message is decoded, but after the headers have been parsed. Remember, you need to know the content encoding before you wrapper the socket. The code in Listing 5 shows where the GZIP code gets inserted in the readFromSocket method. Notice that I declared an InputStream. This will either be of type GZIPInputStream or the InputStream passed in through a parameter. From that point, Axis doesn't know if it is reading from a compressed stream; nor should it. You now have an Axis client that requests and handles GZIP encoding transparently to the underlying code.


Listing 5. Code fragment from HTTPSender.java (readFromSocket)
		
if (null != transferEncoding && 
transferEncoding.trim().equals(HTTPConstants.HEADER_TRANSFER_ENCODING_CHUNKED)) {
	inp = new ChunkedInputStream(inp);
}


//Check the content encoding. If it is gzip then
//wrapper the input stream as a GZIPInputStream

   
String zip = (String)headers.get("content-encoding");
InputStream is = null;

if(zip.indexOf("gzip") != -1){
	GZIPInputStream zipIn = new GZIPInputStream(inp);
	is = zipIn;
} else {
      	is = inp;
}      

//end gzip code

outMsg = 
new Message( new SocketInputStream(is, sock), false, contentType, contentLocation);


Conclusion

GZIP encoding over HTTP is part of Web technologies as we know it. Using it in the existing Web service framework is a logical next step. However, solutions are being designed, built, and deployed every day on these SOAP implementations. In many instances, being able to GZIP encode the SOAP envelope results in faster transaction times with a relatively low overhead. This performance upgrade can be realized today with some simple code modifications. Enabling GZIP encoding in your SOAP environment lets you take advantage of compression today, while patiently waiting for the integration into our favorite implementations.


Resources

About the author

Brian D. Goodman wants what he wants when he wants it and is willing to do whatever he needs to do to get it. You can contact Brian at bgoodman@us.ibm.com. Questions, comments and suggestions are always welcome.

Report abuse help

Report abuse

Thank you. This entry has been flagged for moderator attention.


Report abuse help

Report abuse

Report abuse submission failed. Please try again later.


developerWorks: Sign in

If you don't have an IBM ID and password, register here.


Forgot your IBM ID?


Forgot your password?
Change your password


By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. This profile includes the first name, last name, and display name you identified when you registered with developerWorks. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

Choose your display name

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

(Must be between 3 – 31 characters.)


By clicking Submit, you agree to the developerWorks terms of use.

 


Rate this article

Comments

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=SOA and web services
ArticleID=11786
ArticleTitle=Squeezing SOAP
publish-date=03012003
author1-email=bgoodman@us.ibm.com
author1-email-cc=

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

For articles in technology zones (such as Java technology, Linux, Open source, XML), Popular tags shows the top tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), Popular tags shows the top tags for just that product zone.

For articles in technology zones (such as Java technology, Linux, Open source, XML), My tags shows your tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), My tags shows your tags for just that product zone.

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).