Improve the performance of Web 2.0 applications

Explore different browser-side cache mechanisms

With the emergence and popularity of Web 2.0 applications, the way people use the Internet has slowly changed. These Web 2.0 applications now have many typical aspects, including having a rich client, a large page size, lots of small items on a page, excessive JavaScript coding, and so on. Most of these aspects, with the current Browser technology, can cause browser-side performance issues, especially in long-distance network situations. This article analyzes the key facts of typical Web 2.0 applications and describes how they will affect browser-side performance. It also takes a look at a very important part of browser-side performance -- browser-side cache.

Share:

Jian Qiao Sun, Software Engineer, IBM

Photo of Jian Qiao SunJian Qiao Sun is WPLC performance engineer.



Hua Pin Shen, Advisory Software Engineer, IBM

Photo of Hua Pin ShenHua Pin Shen is WPLC senior performance engineer.



08 December 2009

Also available in Chinese Japanese Vietnamese

Introduction

With the emergence and popularity of Web 2.0 applications, and the way the use of the Internet has changed, a trend has begun that creates a more user-centric approach to content management, information sharing, communication, teamwork, and so on. From a technical point of view, Web 2.0 applications do not bring many new technology breakthroughs. However, these applications do bring a new pattern to the use of the Internet. Web 2.0 applications now have many typical features, including having a rich client, a large page size, lots of small items on a page, excessive JavaScript coding, and so on. These features can cause browser-side performance issues, especially in long-distance network situations. The performance issues can negatively impact the user's experience, and you may not even be aware of the problems. Because developers have excellent network conditions, it can be difficult to fully expose these performance issues.

This article begins with an analysis of the key facts of typical Web 2.0 applications and explains how they will affect browser-side performance. After that, it describes a very important part of browser-side performance, the browser cache. By using proper cache settings, you can give your users a good experience with your application. If you do not have an overall cache policy design, it can not only give you bad performance, but can also invoke some functional defects.

There are many rules that impact the browser cache. In short order, they include Cache-Control, Etag, Expires, Last-Modified, and Vary. All of these settings have different meanings and best-use situations. The difficult thing is that not all popular browsers have the same behavior for the same settings. So, you should know exactly how these browsers will work before you decide to use them. This article looks at the behaviors for the most popular browsers currently in the market: Internet Explorer, Firefox, Chrome, and Safari.

In this article, we also use IBM® Mashups and the open source "Roller Weblogger" to give examples of how to apply different directives to best use the browser cache.

Background

In today's Internet environment, Web 2.0 applications are becoming more popular. A lot of Web sites are built on Web 2.0 technology, such as Facebook, Youtube, and so on. IBM also has Web 2.0 applications, such as Lotus Connections and Lotus Mashups.

There is a basic methodology to browser response time:

  • Browser Response Time = Server Side Time + Page Load Time + Browser Rendering Time
  • Page Load Time = (Number of Requests / Concurrency) * Latency + Page Total Size / Bandwidth

In the above equations:

  • "Server Side Time" is the time spent on the server-side processes such as authentication by LDAP or retrieving information from a database.
  • "Browser Rendering Time" is the time spent on the browser rendering the page and includes activities such as executing JavaScript and parsing the DOM tree.
  • "Num of Request" is the number of HTTP requests.
  • "Concurrency" is the number of parallel connections the browser has to the server.
  • "Page Total Size" is the total size of one page.
  • "Latency" and "Bandwidth" are measures of the network's status. In a common long-distance network environment, the bandwidth is about 1M and the latency is about 100 milliseconds. Therefore, reducing to 100 KBytes in size or reducing to 1 request can save 0.1 seconds in the response time.

Please note that because of the complexity of real-world situations, this equation may not be able to cover all circumstances.

In one typical Web 2.0 rich Internet application (for example, the Lotus Mashup Maker), the browser sends the format definition request to the server first. After receiving the definition response data, the browser sends the data request to the server. Then the browser renders the page for the user. In this pattern, there are a lot of small item requests such as JavaScript files, CSS files, and so on. In a long-distance environment, it can cause client performance issues that will seriously impact the user's experience. Most of the files are static files that can be cached, so if you add the correct cache-controls, expiry header, and some other header metadata that impacts the browser cache, you can obviously improve the user's experience.

Browser cache mechanism

There are several rules that impact the browser cache. In this section, we'll discuss them individually.

Cache-Control

Cache-Control is the most important rule. This field is used to specify directives that must be obeyed by all caching mechanisms along the request/response chain. The directives specify behaviors intended to prevent caches from adversely interfering with the request or response. These directives typically override the default caching algorithms. Cache directives are unidirectional, in that the presence of a directive in a request does not imply that the same directive is to be given in the response.

The cache-control definition is: Cache-Control = "Cache-Control" ":" cache-directive. Table 1 shows applicable values.

Table 1. Common cache-directive values
Cache-directiveDescription
publicAll content will be cached.
privateContent only cached in private cache.
no-cacheAll content will not be cached.
no-storeAll content will not be cached in cache or in an Internet temporary file.
must-revalidation/proxy-revalidationIf the cached content is stale, the request must be sent to the server/proxy to revalidate it.
max-age=xxx (xxx is numeric)After xxx seconds, the cached content is stale

Table 2 indicates whether the browser will resend the request to the server or will use cached content in different situations.

Table 2. Browser response to cache-directive values
Cache-directiveOpen a fresh browser windowPress Enter in the original windowRefreshPress the Back button
publicThe browser renders the page from cache.The browser renders the page from cache.The browser resends the request to the server.The browser renders the page from cache.
privateThe browser resends the request to the server.On the first occurrence, the browser sends the request to the server. After that, it renders the page from cache.The browser resends the request to the server.The browser renders the page from cache.
no-cache/no-storeThe browser resends the request to the server.The browser resends the request to the server.The browser resends the request to the server.The browser resends the request to the server.
must-revalidation/proxy-revalidationThe browser resends the request to the server.On the first occurrence, the browser sends the request to the server. After that, it renders the page from cache.The browser resends the request to the server.The browser renders the page from cache.
max-age=xxx (xxx is numeric)After xxx seconds, the browser resends the request to the server.After xxx seconds, the browser resends the request to the server.The browser resends the request to the server.After xxx seconds, the browser resends the request to the server.

Cache-Control is the most important setting regarding browser cache because it overrides other settings such as Expires and Last-Modified. Additionally, because browsers' behaviors are basically the same, this property is the most efficient way to handle the cross-browser cache issue.

Expires

The Expires entity header field gives the date and time after which the response is considered stale. A stale cache entry may not normally be returned by a cache (either a proxy cache or a user agent cache) unless it is first validated with the origin server (or with an intermediate cache that has a fresh copy of the entity). (Note: The cache-control max-age and s-maxage will override the Expires header.)

Expires takes values in the following format: "Expires: Sun, 08 Nov 2009 03:37:26 GMT". If the date the content is viewed is before the date given, that content is deemed not expired and will draw from the cache. If the date has passed, the content is deemed expired, and the cache will take some action. Tables 3-6 indicate the different browser behaviors for different user operations.

Table 3. Expiration actions when user opens a new browser window
Firefox 3.5IE 8Chrome 3Safari 4
Content is not stale.The browser renders the page from cache.The browser resends the request to the server. The return code is 200.The browser renders the page from cache.The browser renders the page from cache.
Content is stale.The browser resends the request to the server. The return code is 200.The browser resends the request to the server. The return code is 200.The browser resends the request to the server. The return code is 200.The browser resends the request to the server. The return code is 200.
Table 4. Expiration actions when the user presses Enter in the original browser window
Firefox 3.5IE 8Chrome 3Safari 4
Content is not stale.The browser renders the page from cache.The browser renders the page from cache.The browser resends the request to the server. The return code is 304.The browser resends the request to the server. The return code is 304.
Content is stale.The browser resends the request to the server. The return code is 200.The browser renders the page from cache.The browser resends the request to the server. The return code is 200.The browser resends the request to the server. The return code is 200.
Table 5. Expiration actions when the user presses F5 to refresh the page
Firefox 3.5IE 8Chrome 3Safari 4
Content is not stale.The browser resends the request to the server. The return code is 304.The browser resends the request to the server. The return code is 304.The browser resends the request to the server. The return code is 304.The browser resends the request to the server. The return code is 304.
Content is stale.The browser resends the request to the server. The return code is 200.The browser resends the request to the server. The return code is 200.The browser resends the request to the server. The return code is 200.The browser resends the request to the server. The return code is 200.
Table 6. Expiration actions when the user presses Back or Forward
Firefox 3.5IE 8Chrome 3Safari 4
Content is not stale.The browser renders the page from cache.The browser renders the page from cache.The browser renders the page from cache.The browser renders the page from cache.
Content is stale.The browser renders the page from cache.The browser renders the page from cache.The browser renders the page from cache.The browser resends the request to the server. The return code is 200.

Note: All browsers are assumed to run with the default settings.

Last-Modified/E-Tag

The Last-Modified entity header field value is often used as a cache validator. In simple terms, a cache entry is considered to be valid if the entity has not been modified since the Last-Modified value. The ETag response-header field value, an entity tag, provides for an "opaque" cache validator. This might allow more reliable validation in situations where it is inconvenient to store modification dates, where the one-second resolution of HTTP date values is not sufficient, or where the origin server wishes to avoid certain paradoxes that might arise from the use of modification dates.

Different browsers have different behaviors for the configuration. Tables 7-10 indicate the different browser behaviors for different user operations.

Table 7. Last-Modified E-Tag action when the user opens a new browser window
Firefox 3.5IE 8Chrome 3Safari 4
Content has not been changed since the last access.The browser resends the request to the server. The return code is 304.The browser resends the request to the server. The return code is 200.The browser resends the request to the server. The return code is 304.The browser resends the request to the server. The return code is 304.
Content has been changed since the last access.The browser resends the request to the server. The return code is 200.The browser resends the request to the server. The return code is 200.The browser resends the request to the server. The return code is 200.The browser resends the request to the server. The return code is 200.
Table 8. Last-Modified E-Tag action when the user presses Enter in the original browser window
Firefox 3.5IE 8Chrome 3Safari 4
Content has not been changed since the last access.The browser renders the page from cache.The browser renders the page from cache.The browser resends the request to the server. The return code is 304.The browser resends the request to the server. The return code is 304.
Content has been changed since the last access.The browser resends the request to the server. The return code is 200.The browser renders the page from cache.The browser resends the request to the server. The return code is 200.The browser resends the request to the server. The return code is 200.
Table 9. Last-Modified E-Tag action when the user presses F5 to refresh the page
Firefox 3.5IE 8Chrome 3Safari 4
Content has not been changed since the last access.The browser resends the request to the server. The return code is 304.The browser resends the request to the server. The return code is 304.The browser resends the request to the server. The return code is 304.The browser resends the request to the server. The return code is 304.
Content has been changed since the last access.The browser resends the request to the server. The return code is 200.The browser resends the request to the server. The return code is 200.The browser resends the request to the server. The return code is 200.The browser resends the request to the server. The return code is 200.
Table 10. No cache settings used and the user presses Back or Forward
Firefox 3.5IE 8Chrome 3Safari 4
Content has not been changed since the last access.The browser renders the page from cache.The browser renders the page from cache.The browser renders the page from cache.The browser renders the page from cache.
Content has been changed since the last access.The browser renders the page from cache.The browser renders the page from cache.The browser renders the page from cache.The browser resends the request to the server. The return code is 200.

Note: All browsers are assumed to run with the default settings.

No settings related with cache

If you don't define any cache-related settings, the different browsers have different behaviors, and, sometimes, the behaviors of the same browser are different when run several times in the same situation. It can get complicated. Also, some content that should not be cached will be cached, which can create security issues.

Different browsers have different behaviors. Table 11 indicates the different browser behaviors.

Table 11. No cache settings used and user opens a new browser window
Firefox 3.5IE 8Chrome 3Safari 4
Opens a fresh page.The browser resends the request to the server. The return code is 200.The browser resends the request to the server. The return code is 200.The browser resends the request to the server. The return code is 200.The browser resends the request to the server. The return code is 200.
Presses Enter in the original window.The browser resends the request to the server. The return code is 200.The browser renders the page from cache.The browser resends the request to the server. The return code is 200.The browser resends the request to the server. The return code is 200.
Presses F5 to refresh.The browser resends the request to the server. The return code is 200.The browser resends the request to the server. The return code is 200.The browser resends the request to the server. The return code is 200.The browser resends the request to the server. The return code is 200.
Press Back or Forward.The browser renders the page from cache.The browser renders the page from cache.The browser resends the request to the server. The return code is 200.The browser resends the request to the server. The return code is 200.

Note: All browsers are assumed to run with the default settings.

Application example

This section provides examples of Web site analyses to determine the correct caching behavior using both IBM Commercial and open source tools.

Apache Roller Weblogger

Apache Roller Weblogger is an open source Web 2.0 Web application. It is the open source Java™ blog server that drives blogs.sun.com, blog.usa.gov, IBM Lotus Connections, IBM Developer Works blogs, and numerous others.

In this article, we choose the IBM My developerWorks blogs as an example to explain the detailed cache settings. Figure 1 shows a screenshot of the My developerWorks blog page.

Figure 1. My developerWorks blog page
A control panel with tags on the left, a list of available blogs in the center, a list of featured blogs with photos of authors on the right

This page has 62 requests, most of them are png, gif, js, or some other static file type. When the user accesses this page for the first time, it takes about 16 seconds to finish displaying the whole page in the browser. If you define the correct cache settings, most of the resources will be cached on the browser side. Therefore, when the user accesses this page again, the number of requests for this page reduces to 22, and will only take about 6 seconds to load. The user experience improves significantly.

Now we'll analyze some important request cache settings. The relevant Weblogger output is seen in Figure 2.

Figure 2. My developerWorks blogs home Response Header 1
Outline in red are two lines: 'Last-Modified Tue, 13 Oct 2009 05:48:08 GMT' and 'Cache-Control public, must-revalidate, max-age=5'

First, the Cache-Control overrides the Last-Modified settings, so the page can be cached locally for 5 seconds, but must revalidate if the content is stale. When a user accesses this page, the browser first checks the local cache to determine if the local files have expired. If the content is stale, the browser sends a request to the server to compare the Last-Modified time stamp. If the response file has the same Last-Modified time stamp, the server returns code 304 to the browser to tell it that the response file is the same.

Figure 3. My developerWorks blogs home Response Header 2
Details from the My developerWorks Blog page with the following highlighted in red: 'Cache-Control no-cache, max-age=0'

This Cache-Control setting indicates that this response cannot be cached. From a business perspective, this request is used to check the user authentication and authorization, which should not be cached.

Figure 4. My developerWorks blogs home Response Header 3
Shows the following lines highlighted in red: 'Cache-Control public, max-age=86400' and 'Last-Modified Sun, 15 Feb 2009 21:48:46 GMT'

This response file is a JavaScript lib that is rarely modified, so it has the max-age equal to one day.

Mashup Center

Mashup Center is designed to provide an easy-to-use business mashup solution, supporting a line-of-business assembly of dynamic situational applications — with the security and governance capabilities IT requires. It includes Lotus Mashups and InfoSphere MashupHub. Figure 5 shows a snapshot of Lotus Mashups in action.

Figure 5. Mashup homepage
The Lotus Mashup analysis of a page with a data viewer of park information in the top and three example charts side by side

Figures 6 and 7 show selected HTTP headers.

Figure 6. Mashup home Response Header 1
Feedback from the Mashup Home Response Header 1 with 'Expires Wed, 14 Oct 2009 07:56:36 GMT' and 'Cache-Control public, max-age=86400' in red

This request retrieves theme information that can be cached from the server.

Figure 7. Mashup home Response Header 2
'Set-Cookie JSESSIONID=0000Fqxf-SY3wIX3UbLOD-Mv0_7:~1; Path=/' 'Expires Thu, 01 Dec 1994 16:00:00 GMT' and 'Cache-Control no-cache=set-cookie, set-cookie2'

This is the personal main page, which should not be cached. Note the the Expires date value is set to a date in the distant past so that it will always be refreshed.

Summary

Because of the complexity of multiple browsers, the proper cache settings are very important. In this article, we described the following best practices:

  • Cache as many files as you can to reduce loading times and improve performance.
  • Use the cache-control to define the cache behavior as much as possible, especially for IE. This reduces the inconsistencies between different browsers and is the best way to improve performance.
  • Do not use "no settings related with cache."
  • With the default settings, when freshly opened, the IE browser almost always sends a request to the server side to retrieve data.
  • If one page should not be cached, use "cache-control: no-cache, no-store" to make sure the page will not be cached, especially when the data involves security or sensitive information.
  • Unless it's necessary, do not use the post request, because it cannot be cached.

Resources

Learn

Get products and technologies

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Web development on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Web development
ArticleID=453831
ArticleTitle=Improve the performance of Web 2.0 applications
publish-date=12082009