Speed Web delivery with HTTP compression

A look at the page-delivery effects of data compression in HTTP 1.1

HTTP compression, a recommendation of the HTTP 1.1 protocol specification for improved page download time, requires a compression feature implemented at the Web server and a decompression feature implemented at the browser. While popular browsers were able to receive the compressed data as early as three years ago, Web servers were not ready to deliver compressed content. The situation is changing, though, as server compression modules are introduced. Dr. S. Radhakrishnan dissects Web compression, examines the benefits of HTTP compression, offers several compression tools, and highlights the effectiveness of the technology in a case study.

Radhakrishnan Srinivasan (radhakr@onebox.com), Senior Architect, eBusiness, TATA Consultancy

Radhakrishnan Srinivasan is a senior architect in Chennai, India in the eBusiness practice of TATA Consultancy services, a global software and services company based in India. His primary focus is to analyze, define, and implement architectures for enterprise applications, and he has been responsible for creating high-volume, mission-critical applications and for providing guidance on architecture issues to a multitude of international customers. The author holds a PhD in computer vision/image processing from Indian Institute of Technology, Chennai. His active professional interests include Web and distributed architectures and application integration. The opinions expressed in this article are his own and are not a reflection of those of his employer. He can be contacted at sradhakrishnan@chennai.tcs.co.in or radhakr@onebox.com.



22 July 2003

Many Internet applications deliver data and content in the form of dynamically generated HTML; the HTML dynamic content is generated by a Web or application server using such technologies as Java Servlet, JavaServer Pages, Personal Home Pages (PHP), Perl scripts, or Active Server Pages (ASP). The speed with which these Web pages are available to the client browser on request mainly depends on two things:

  • The Web or application server's ability to generate the content. This is related to the general performance characteristics of the application and the servers.
  • The network bandwidth.

The performance of the Web application is determined by good design, tuning the application for performance, and if needed, by providing more hardware power for the servers. The network bandwidth available to the user, directly related to the page-download time, is normally taken for granted. But for the user, it is the speed of Web page delivery that indicates the performance level, not how fast the application is executed on the server.

Therefore, to ensure a good user experience, the performance of the network and its bandwidth is considered an important part of the overall performance of the application. This becomes even more important when network speed is low, network traffic is high, or the size of the Web pages is large.

In the case of the Internet, the traffic may not be controllable, but the user's network segment (modem or other technology) and the server's connection to the Internet can be augmented. In the case of Web applications hosted and accessed in close premises through Local Area Networks (LANs), the bandwidth is usually sufficient for fast page download. In the case of Wide Area Networks (WANs), segments of the network may have low speed and high traffic. In this case, the user accessing the application might experience poor page download time.

Ideally, it is desirable to have increased bandwidth in the network; practically, it results in additional cost. However, you can have increased bandwidth without a substantial cash investment. If Web pages (containing mainly plain text documents and images) could be compressed and sent to the browser on request, the speed of page downloads improves without regard for the traffic or speed on the network. The user receives faster response time for an HTTP request.

In this article, I explore the intricacies of Web-based compression technology, detail how to improve Web page download times by compressing the Web pages from the Web server, highlight the current status of the technology, and provide a real-world case study that examines the particular requirements of a project. (Throughout the article, the term Web application refers to an application generating dynamic content -- for instance, any content created on the fly.)

Now, look at the specifics of Web-related compression technology.

Types of compression

I first examine the following various types and attributes of compression:

  • HTTP compression. Compressing content from a Web server
  • Gzip compression. A lossless compressed-data format
  • Static compression. Pre-compression, for when static pages are the delivery
  • Content and transfer encoding. IETF's two-level standard for compressing HTTP contents

HTTP compression

HTTP compression is the technology used to compress contents from a Web server (also known as an HTTP server). The Web server content may be in the form of any of the many available MIME types: HTML, plain text, images formats, PDF files, and more. HTML and image formats are the most widely used MIME formats in a Web application.

Most images used in Web applications (for example, GIF and JPG) are already in compressed format and do not compress much further; certainly no discernible performance is gained by another incremental compression of these files. However, static or on-the-fly created HTML content contains only plain text and is ideal for compression.

The focus of HTTP compression is to enable the Web site to serve fewer bytes of data. For this to work effectively, a couple of things are required:

  • The Web server should compress the data
  • The browser should decompress the data and display the pages in the usual manner

This is obvious. Of course, the process of compression and decompression should not consume a significant amount of time or resources.

So what's the hold-up in this seemingly simple process? The recommendations for HTTP compression were stipulated by the IETF (Internet Engineering Task Force) while specifying the protocol specifications of HTTP 1.1. The publicly available gzip compression format was intended to be the compression algorithm. Popular browsers have already implemented the decompression feature and were ready to receive the encoded data (as per the HTTP 1.1 protocol specifications), but HTTP compression on the Web server side was not implemented as quickly nor in a serious manner.

Gzip compression

Gzip is a lossless compressed-data format. The deflation algorithm used by gzip (also zip and zlib) is an open-source, patent-free variation of the LZ77 (Lempel-Ziv 1977) algorithm.

The algorithm finds duplicated strings in the input data. The second occurrence of a string is replaced by a pointer (in the form of a pair -- distance and length) to the previous string. Distances are limited to 32 KB and lengths are limited to 258 bytes. When a string does not occur anywhere in the previous 32 KB, it is emitted as a sequence of literal bytes. (In this description, string is defined as an arbitrary sequence of bytes and is not restricted to printable characters.)

Static compression

If the Web content is pre-generated and requires no server-side dynamic interaction with other systems, the content can be pre-compressed and placed in the Web server, with these compressed pages being delivered to the user. Publicly available compression tools (gzip, Unix compress) can be used to compress the static files.

Static compression, though, is not useful when the content has to be generated dynamically, such as on e-commerce sites or on sites which are driven by applications and databases. The better solution is to compress the data on the fly.

Content and transfer encoding

The IETF's standard for compressing HTTP contents includes two levels of encoding: content encoding and transfer encoding. Content encoding applies to methods of encoding and compression that have been already applied to documents before the Web user requests them. This is also known as pre-compressing pages or static compression. This concept never really caught on because of the complex file-maintenance burden it represents and few Internet sites use pre-compressed pages.

On the other hand, transfer encoding applies to methods of encoding during the actual transmission of the data.

In modern practice the difference between content and transfer encoding is blurred since the pages requested do not exist until after they are requested (they are created in real-time). Therefore the encoding has to be always in real-time

The browsers, taking the cue from IETF recommendations, implemented the Accept Encoding feature by 1998-99. This allows browsers to receive and decompress files compressed using the public algorithms. In this case, the HTTP request header fields sent from the browser indicate that the browser is capable of receiving encoded information. When the Web server receives this request, it can

  1. Send pre-compressed files as requested. If they are not available, then it can:
  2. Compress the requested static files, send the compressed data, and keep the compressed file in a temporary directory for further requests; or
  3. If transfer encoding is implemented, compress the Web server output on the fly.

As I mentioned, pre-compressing files, as well as real-time compression of static files by the Web server (the first two points, above) never caught on because of the complexities of file maintenance, though some Web servers supported these functions to an extent.

The feature of compressing Web server dynamic output on the fly wasn't seriously considered until recently, since its importance is only now being realized. So, sending dynamically compressed HTTP data over the network has remained a dream even though many browsers were ready to receive the compressed formats.


The benefits of HTTP compression

Three independent studies -- two conducted by the WWW Consortium (W3C) and one conducted for the Mozilla organization -- highlight the benefits of HTTP compression. The first W3C study, reported in 1997, focused on testing the effects of HTTP 1.1 persistent connections, pipelining, and link-level document compression. The second W3C study, reported in 2000, looked at the possible benefits for performance using compression of HTML files over a LAN with composite HTML data (compressed) and image content (uncompressed). The Mozilla study, reported in 1998, observes the performance of content-encoded compression.

Following are brief summaries of the results of these studies, offered to highlight the benefits of HTTP compression. (The study results are not completely discussed in this article; readers may refer to the original study for full discussion. For further details, check Resources for links to the original studies.)

W3C: On performance of HTTP 1.1

This study employed two Web servers, Jigsaw and Apache, and reports the savings in the number of packets sent (Pa) and download time in seconds (Sec). The study was conducted using a 28.8 kbps modem and an HTML file containing no images.

Table 1 illustrates the compression ratios and download times achieved.

Table 1. Compression ratios and download times
Jigsaw PaJigsaw SecApache PaApache Sec
Uncompressed HTML6712.216712.13
Compressed HTML21.04.354.354.43
Saved using compression (percent)68.764.468.764.5

W3C: Effect of compression in a LAN

This study involves a mix of images and HTML content. The overall payload that is transferred in the uncompressed version of the download is a 42 KB HTML file with 41 inline GIF images for a grand total of 125 KB. The compression decreases the size of the HTML page from 42 KB to 11 KB (73.8 percent compression), but the images are untouched. This means that the overall payload is decreased 31 KB, or approximately 19 percent.

Table 2 reports the following:

Table 2. Compression ratios and download times with image/HTML mix
Jigsaw PaJigsaw SecApache PaApache Sec
Pipelining167.40.85161.60.64
Pipelining and HTML compression140.60.62137.40.49
Saved using compression (percent)16271523

The study author notes that,

The table shows that, for the Jigsaw server, compression provides a net gain of 15 percent less packets but as much as a 27 percent gain in time. Likewise, for Apache a packet gain of 16 percent is seen, but a time gain of 23 percent. The interesting thing is that the overall payload is decreased by 19 percent, which is more than the gain in TCP packets. From this perspective, compression gives a slightly worse "TCP packet usage". However, the gain in time is relatively better than the gain in payload. This indicates that the relationship between payload, TCP packets, and transfer time is non-linear and that the first packets on a connection are relatively more expensive than the rest.

Mozilla: Performance of content-encoded compression

The third study, reported for Mozilla, uses the Apache Web server version 1.3, which is capable of parsing the HTTP header for content encoding, Accept-encoding gzip, and can send pre-compressed HTML files to the browser.

Table 3 illustrates what happens when only plain HTML is sent with no images. It's clear that an improvement in download time is achieved with a slower network.

Table 3. Mozilla and Apache with plain HTML
 
ISDN64 kbits/sec28.8 kbits/sec
 No GZIPGZIPNo GZIPGZIP
 105.183.2327.9121.8
  21% faster 63% faster

The results for a mix of images and HTML are given in Table 4.

Table 4. A mix of HTML and images
 
ISDN128 kbits/sec28.8 kbits/sec
 No GZIPGZIPNo GZIPGZIP
 82.177.6264.7184.4
  5.5% faster 30% faster

Reading the results

It is clear from these studies that good compression ratios are possible and the download time of Web content can be accelerated using HTTP compression. The studies used a mixture of HTML and images in such a way that the images occupied a significant portion of the payload report and showed a 20 to 30 percent improved download time. When the payload consists only of HTML content, approximately a 65 percent improvement in download time is reported.

It is clear that for Web applications containing a ratio of fewer images (mostly a few buttons) and more HTML content, the overall improvement in download time is closer to 65 percent than 20 or 30 percent. These studies indicate strongly that employing HTTP compression in Web applications is beneficial to download time, and thus to a good user experience.

Another indirect benefit of HTTP compression is that the data passing between the Web server and the browser is encrypted by virtue of the compression algorithm (though it's not strong encryption), adding more security to the data. Of course, data being sent from the browser to the server is not compressed and therefore doesn't carry this extra encryption.


Tools for compression

While the benefits of HTTP compression have long been suspected, and the capability has been implemented in popular browsers as early as 1998, implementation of this technology in Web servers has truly lagged.

The Apache Web server 1.3 can deliver pre-compressed static data to the browser. And, the Microsoft Internet Information Server 5.0 (IIS) compresses a static page when it is requested for the first time and stores the compressed content in a cache directory. When the same page is requested again, the server delivers the page from the temporary directory instead of delivering it from the Web server document directory. Any newer version of the static content placed in the Web server whose compressed content is already available in the cache directory will be automatically compressed and the cache directory will be updated with latest content. Also, with IIS 5.0, compressing dynamic content can be enabled.

But with most Web server vendors being more or less silent about introducing dynamic compression, other companies have started producing compression plug-ins for popular Web servers. Following is a list of some of the promising products.

mod_gzip

Remote Communications has introduced the first publicly available compression module for the Apache Web server, the most widely used Web server on Internet. The module was built on Apache's Add module specifications by which third-party modules can be incorporated with Apache products. This module, named mod_gzip, uses the publicly available gzip algorithm to compress data in transit from the Web server.

Since the introduction of this module, which received widespread approval from the open-source community of Web server users, newer versions and fixes have been introduced. Many devotees using Apache Web server report good compression ratios. Benchmark results for this product are also available.

Hyperspace

This is a commercial version of a compression module from the creators of mod_gzip. Unlike mod_gzip, the Hyperspace product compression module need not be integrated with the Web server and can be used with any Web server. This product interacts with the base Web server by using an additional port to which both the Web server and the compression product will listen.

Following are some of the features of the Hyperspace module:

  • The product can be installed in a remote host, separated from the Web server host
  • Customizable log entries for HTTP access and compression statistics
  • A separate admin server for displaying real-time compression statistics that indicates total bytes sent and saved
  • Ability to specify the content type to be compressed
  • Image compression

An SSL version of this product is also available.

Vigos Web site accelerator

A commercial product from Vigos AG, this software tool (the company also offers a hardware version) also compresses the Web server responses on the fly. Based on a proprietary SmartShrink technique, the Vigos accelerator can decide whether the browser is capable of accepting the compressed data and it will send the appropriate compressed or uncompressed data. This product, too, will act as a standalone unit and, therefore, can be used for any Web server. Benchmark results are available.

Some of the main features of the Vigos accelerator:

  • The product can be used as a remote host, separated from the Web server
  • Automatic determination of whether the browser accepts compressed files or not
  • Customizable log entries for Access and Error logs and compression statistics

An SSL version of the product is available.

WebWarper

This is a free Web service through which the contents of a Web site can be accessed. While this service sounds interesting, the potential delay in IP forwarding and the necessity of a client-side plug in (to change the URL entries from the Web page to be forwarded by the accelerator) makes this unsuitable for a Web application. Still, general Internet users may benefit from the service.

The company also has a pay-ware module written in Perl, designed for HTTP compression with both the IIS and Apache Web servers.

Note: HTTP compression for Apache is achievable using mod_gzip, an open-source offering. However, for other Web servers which do not implement HTTP compression, a commercial product might be needed.

The following discussion presents a real-world case study using mod_gzip.


A detailed look at real-world compression

A major division of a large company (an important client for TCS) has a legacy application for which a browser-based user interface needs to be developed. The existing application logic resides in an OS390-based system. Corporate IT chose a WebSphere application server with the IBM HTTP Server (and other load-balancing and security products) for all the company's Web-based applications. This environment will be used to host any Web-based applications developed by each division of the company.

This particular application under development is a critical online module to be used by the sales and customer-care representatives of the company who are distributed all over the world. The representatives, while talking to customers over phone, will need access to the application to receive and update information pertaining to the customer (such as order status, history, or ID). The application's response time needs to be very short: As stipulated, it is to be on the order of three seconds.

Exploiting more muscle power in the servers could enhance the application's performance: More servers and load balancing, more CPU power, and increased RAM. Similarly the application design can be tuned for performance: Fewer object creations, ongoing database refinements, and using database connection pooling. Let's assume that these considerations will be optimally handled by the server farm infrastructure and the application design.

However, the application will be accessed over a WAN with segments of it having as low a bandwidth as 8 kbits/second, so for the user, the slow transfer of the Web pages in the network offsets any improvements in performance made in the server.

What are the best options?

Given the necessity for a Web-based UI and a fast response time over somewhat uncontrollable network bandwidth and traffic conditions, the following available choices were winnowed down in this order:

  1. Because a Web page is like an HTML file which contains both data and formatting information interlaced, the latter being larger than the former, only data can be sent downstream to the browser. This can be achieved by employing applets at the browser. However, for the following reasons the use of applets are discouraged:
    • Many client browsers run behind local firewalls which can restrict outside access. Configuring these locations for applets is beyond the authority held by the organization.
    • Many users do not like the look and feel of a thick client application.
    • Java support is required for executing the applets and should reside with the client machines, requiring installation and maintenance of appropriate JVMs.
  2. Since using applets was not a desirable state, sending less data over the network will improve the page download speed (or HTTP compression).
  3. Even using HTTP compression won't help if the bandwidth is despairingly low, so the company decided to upgrade or discard the network segments with speeds less than a particular limit. This value is decided to be 32 kbits/sec. Clients in these discarded segments will be advised to access the application directly from Internet.

Simple arithmetic

A typical Web page for the Web application in consideration consists of pages that are 8 to 15 KB (excluding images). Some information pages might be 25 KB, but this would be a rare occurrence in the application. Taking as an example an 8 KB Web page, a single user, and a 32 kbps line, we find

Download time = (8*1024 bytes) / (32*1000/8 bytes/seconds) = 2.048 seconds
(ignoring network latencies and any delay introduced by the Web server and browser)

Assuming that the processing done by the application server and the mainframe system takes about 1.5 seconds, the Web pages could not be delivered to the browser in a 3-second period. In addition, if many users are using the same line, the traffic will be high and no space is available in the line, which results in slower response time for all the users.

If the same page could be compressed by a factor of 50 percent, then the download time drops to half. Furthermore, other users can use the saved bandwidth.

Clearly, applying HTTP compression for this application will boost the performance from the user's perspective.

A list of desired behaviors

If HTTP compression is enabled in a Web server that is hosted in a complex networked server farm environment and accessed by established users, the following behaviors are desired from the compression product:

  • The product should not demand any browser side plug-ins.
  • The product should have features to allow and disallow specific MIME types. Not all types of content should be compressed automatically. For example, some browsers may not properly interpret compressed JavaScript and Cascading Style Sheet (CSS) files. Similarly PDF documents and HTML help files may not be compressed.
  • The product should not consume significant computational power and time from the server environment. A smaller footprint is always desired.
  • The product should allow compression of files delivered from specific directories and URLs. This feature is important when more than one application is hosted in a networked environment and only content from specific applications need to be compressed.
  • The product should allow for a dynamic health check to determine whether the compression feature is behaving properly or not. Apart from log files, the ability to get a browser-based display of run-time statistics is necessary.
  • The product should offer additional image compression features even if since is small because the images used in the Web pages are already in compressed format.

And it goes without saying, your product manufacturer should provide good support.

Next step: Finding the right product

Weighing the pros and cons, the company decides to use HTTP compression with the Web server for the Web application. The server of choice, the IBM HTTP Server in this case, comes with the WebSphere environment. But one problem us: By itself, the IBM HTTP Server does not support HTTP compression.

Since the IBM HTTP Server is an Apache server clone, it was thought possible to use the freely available mod_gzip module. This didn't work because apparently a header file (core.h) used in the compiled binary of IBM HTTP Server is different from the one used in the original Apache header file. Because of this incompatibility, the mod_gzip binary does not work with this HTTP server. (Further along in the article, though, you'll find a workaround for this problem.)

A features study is a handy tool when trying to decide on which product or version of technology to implement for your project. I carried out a features study to weigh the comparative benefits of the mod_gzip module (with an Apache Web server) with two other commercial products (using the IBM HTTP Server). I discovered that the commercial products offer comparable compression ratios, but do not offer any overwhelming benefit for the project's goals. So I offer the details of this study on the features of the mod_gzip module (with observations related to the commercial products whenever necessary).

First, I made a list of the availability of features. Table 5 is a list of mod_gzip features:

Table 5. Features study for mod_gzip
Desired featureSupported?
What Web servers are supported?Apache.
Can it listen to a remote Web server?No.
Is SSL support available?Yes.
What's the source?Free download.
What's the footprint?Small.
What platforms are supported?Win9x, NT, 2000, Linux, FreeBSD, Unix, and others.
What additional browser plug-ins required?None.
What browser settings are required?IE: Set "Use HTTP 1.1."
Can it compress contents from specific Web server directory?Yes.
Can it compress contents coming from specific URL strings?Yes.
Does it include/exclude specific MIME types?Yes.
Does it include/exclude specific browser types?Yes.
Can it specify minimum and maximum file size for content that should be compressed?Yes.
Can an additional port be specified in Web server configurations?None.
Where are compression statistics available?Log file/HTML screens.
What are the logging details?Custom log format.
Are error logs present?Yes (part of the Apache error log).
Does it have a disable compression feature?Yes.
Does it offer image compression?No.
Is an HTML-based screen available for a quick glance of run-time compression statistics?Yes.
Can you specify numbers for threads/pools (simultaneous users)?No (this can be done in Apache Web server).
Number of concurrent users supported?A feature of the Web server.
Can you specify compression level?No.
Can you start and stop the compression product?Web server should be stopped.
What is the ease-of-installation and configuration?Good.

Note: The versions of commercial products studied were found to offer many of the above features and some variations. Some important variations to note:

  • Many commercial products essentially stand outside the HTTP server and therefore the compression features can be turned on and off without restarting the Web server.
  • Administrators can specify various compression levels.
  • One of the desired important features which was missing from the commercial products studied was compression of content from a specific URL. This means that either all the content from the server is compressed or none is compressed. However, it should be noted that the vendors may implement the features in their next versions or for an additional fee if the requirements are stated.

Next I looked at the compression ratios. For evaluating and presenting the compression ratios, I set up a separate environment (a Windows NT workstation 256 MB RAM and 1 GHz Pentium 4 processor). The products were installed with default settings as provided by the vendors.

The following types of files were used for testing:

  • HTML files resembling some typical application output served by the application server.
  • Dynamic output (Servlet and JSP output) from WebSphere sample applications.

The files used are indicated in the Table 6. (To view a few of the screens used in the samples, see the first entry in Resources.)

Table 6. Files used for testing
Sample number.Size (bytes)*File type
1822WebSphere Dynamic output
2864WebSphere Dynamic output
31370WebSphere Dynamic output
41523WebSphere Dynamic output
54588WebSphere Dynamic output
65248WebSphere Dynamic output
76201A typical application JSP output
86443A typical application JSP output
96760A typical application JSP output
107915WebSphere Dynamic output
119563WebSphere Dynamic output
1213382A typical application JSP output
1314717WebSphere Dynamic output
1415211A typical application JSP output
1527815A typical application JSP output

*The file size is determined by noting the entry for output data sent from Web server logs and the size represents only the HTML content and not any images.

The graph in Figure 1 depicts the Compression Ratio observed from the log files.

Figure 1. Compression ratios observed from the log files
Figure 1. Compression ratios observed from the log files

I made the following observations from the data on the compression ratios:

  • The compression ratios are quite good for the application. The benefit is more pronounced when the size of the file is larger.
  • White space elimination is not included; however, this is likely to be incorporated in the commercial versions on request.
  • The compression ratios were noted from individual products' log files. No attempt is made to limit the bandwidth between the client and server machines and observe (or simulate) the download time.
  • All three products show almost equal compression ratios, probably because the underlying algorithm used was the same.
  • The computational efficiency (speed and amount of server resources used) and the products' behavior with multiple concurrent access were not studied since this was not the goal. I did observe the products' general behavior with multiple users in the development environment in which 20 people used the application while under development. No abnormal utilization of CPU or time was noticed.
  • Sample application files compressed contained some inline JavaScript which executed normally in the browser.
  • Testing with SSL has not been done.
  • None of the products compress pages cached by the Web server, since the products cannot access the cache regions of the Web server, so it appears that careful evaluation needs to be made about servicing typical static Web pages, focusing on whether to enable caching in the Web server or enable compression.

About that last point: Choosing compression for large Web sites serving static pages is not likely to produce any performance improvements except for the first time the Web page is accessed, since these Web sites may maintain dedicated caching servers. However, when the same static pages are delivered from a Servlet engine dynamically, caching does not occur and, hence, compression was possible. Since for a Web application most of the pages will be generated only on request, caching will not happen and hence compression is suited for these types of applications.

The recommendation

I observed that all the compression products investigated render more or less equal compression ratios. Therefore, other features of a server farm environment become more important aspects of the decision (like the ability of the product to compress only specific types of contents or contents from specific URLs).

Since HTTP compression technology (at least on the server side) is still in its nascent stage, the probability of unforeseen problems occurring in a complex networked environment is still high, making post-sale vendor support of such technologies crucial for the successful implementation.

In this case, because the IT department chose IBM as the vendor providing most of the hardware, software, installation, and support for this project, it only makes sense to evaluate a compression solution from IBM before looking to other products. But what about that support problem?

IBM's Web server -- a close cousin of the Apache Web server for which mod_gzip compression support is available -- does not officially support mod_gzip, as we mentioned earlier. But IBM research teams have developed a patch for support of mod_gzip compression. The patch is not likely to incorporated in the current version of the Web server given that this type of compression is slated to be incorporated with later versions of the server.

With this in mind, the company made a request of IBM to support compression for its server farm as a special case, which the IBM team agreed to. And the company went with mod_gzip.


In conclusion

I hope this article has demonstrated clearly that HTTP compression for Web applications is a must for a satisfying user experience. Whenever users and servers are connected to the Internet over low-speed connections or on high-traffic routes, HTTP compression can keep the lines of effective communication open.

In addition, when integrated with the Web server as in the the case of mod_gzip for Apache, compression provides a more pleasant user experience. The Web server directly serves less data and hence the overall throughput of the Web server will improve.

Add-on compression products which take the output from the Web server and compress the data through software or hardware may not directly improve the Web server performance. But these products also offer such benefits as serving contents from cached servers and serving multiple Web servers in the load-balanced situation.

The comparative study offered a set of steps to approach integrating compression into your existing systems, including identifying features, their usability in a server farm environment, and the general benefits of HTTP compression. The author would like to note that the comparative study:

... was carried out with specific requirements in mind and with evaluation copies obtained from respective vendors. The results presented are as observed by the author at the time of trial runs. The author acknowledges that the products presented herein performed well during trial runs and will not be responsible for failing to present any additional features or any minor features presented inaccurately. The author has no motives, financial or otherwise, other than technical, while evaluating the tools. The information provided in this paper is for knowledge sharing only and any commercial gain/loss for decisions made based on this study may not be attributed to the author.

Resources

  • View screens from the following sample files from the tests in this article: Sample 5; Sample 7; Sample 9; Sample 12; Sample 14.
  • Learn from IBM IT architect Brian Goodman as he details GZIP encoding over an HTTP transport for improving the performance of Web applications in "Squeezing SOAP" (developerWorks, March 2003).
  • For more on the protocol specifications of HTTP 1.1 (including HTTP compression), see "Hypertext Transfer Protocol: HTTP/1.1" (RFC 2616, Network Working Group, R. Fielding et al, June 1999).
  • Read P. Deutsch's "GZIP file format specification version 4.3" (RFC 1952, Network Working Group, May 1996) for details on the publicly available gzip compression format.
  • Check out "HTTP Compression Speeds up the Web" as Peter Cranstone deals with the IETF's standard for compressing HTTP contents through two levels of encoding.
  • Find the 1997 W3C study that focused on testing persistent connections, pipelining, and link-level document compression in HTTP 1.1 in "Network Performance Effects of HTTP/1.1, CSS1, and PNG" (June 1997, W3C NOTE-pipelining-970624, Henrik Frystyk Nielsen et al).
  • Read the 2000 W3C study's examination of possible performance benefits with compression of HTML files over a LAN; the data is available in "The Effect of HTML Compression on a LAN" (June 2000, Henrik Frystyk Nielsen).
  • Find the results of the 1998 Mozilla study (that observes the performance of content-encoded compression) in "Performance: HTTP Compression" (September 1998, John Giannandrea and Eric Bina).
  • Get all the information on compression algorithm mod_gzip.
  • Review Benchmark results for mod_gzip on Apache.
  • Check out the commercial Hyperspace compression module, a stand-between that you need not integrate into a Web server.
  • Look at the commercial compression module from Vigos, AG. Based on proprietary technology, it can also be a stand-between, and comes in a software and hardware version.
  • Try WebWarper, a free Web service designed to enhance download speeds.
  • Try the IBM HTTP Server version 2.0 with its new enhancements, including a Fast Response Cache Accelerator for Windows and support for HTTP compression.

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Web development on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Web development
ArticleID=11818
ArticleTitle=Speed Web delivery with HTTP compression
publish-date=07222003