It's a bird, it's a plane...
Comet is a term describing the interaction of a client and server that uses long-lived HTTP connections to enable event-driven communication between the client and server while the connection remains open. Ever since Alex Russell defined the term, in Comet: Low Latency Data for the Browser (see Resources), Comet has become one of the Web 2.0 buzz words. Comet style applications maximize connection timeouts and infrastructure to provide updates to the browser more quickly than other solutions and with less data sent -- which sounds great. But there are drawbacks to Comet style connections you should know about before you consider using them anytime soon.
Problems solved (and created)
The most common problem many developers in a Web 2.0 world have dealt with is streaming events generated on the server to the clients. There are three popular ways to solve this problem:
Figure 1. Polling
- Comet long-polling or hybrid polling
Figure 2. Hybrid polling
- Comet streaming or forever frame
In a Comet streaming style application, the client opens a connection and sends a data chunk encoded, with the server sending data chunked encoded in response. After the initial connection is created, there is very small overhead to transmit data either way. The connection stays open as long as possible, and each set of bytes contains only the chunked encoding portion overhead, which is a hexadecimal number representing the size of the data being sent and a carriage return line feed, typically under 10 bytes.
Figure 3. Comet streaming
Advantages of Comet streaming and hybrid polling
- Browser support without a plug-in.
- Immediate notification of clients; no need to cache events.
- Reduced CPU load on the server.
- Much less overhead with respect to number of bytes to send events and information.
Disadvantages of Comet streaming
Sadly, most of the Internet infrastructure is not ready for prolific use of Comet streaming style applications today. Here are some of the reasons why I claim this:
- Standards are behind
The HTTP protocol was designed such that when a request begins to be sent, the connection that request is being sent on is virtually locked until the response is sent back in its entirety. This means that during the duration of a Comet streaming style communication, at least one connection is tied up on each system from the client device to the backend server.
- Synchronous vs asynchronous
Many pieces of the Internet infrastructure are synchronous in nature. This includes servers (such as different firewalls) and programming models (such as HTTP servlets). Accordingly, the Internet infrastructure of today is generally not built for CS style applications to be pervasive, since most of these systems have a limited number of threads and each Comet connection ties up a thread.
- Limited connections
Proxy servers and firewalls will struggle with large scale Comet streaming applications. Why? Today's operating systems are generally not prepared for pervasive use of Comet streaming. Because of the issue mentioned above about the HTTP protocol, each user of a Comet streaming application will tie up two connections and file descriptors in each firewall and proxy that the HTTP request flows through. Many operating systems -- and the software built on those operating systems -- are limited to less than 65,000 connections across the entire machine at one time. This means that these systems could run out of available connections before they run out of CPU or memory, requiring more systems -- when there is plenty of CPU and memory available on existing systems.
- Working as designed
Some systems are not designed to handle this sort of application at all. These systems do things such as buffer HTTP chunks until you reach a certain size before sending it on to clients. There are firewalls that will not send the HTTP request through to the server until the complete request is received and able to be inspected in its entirety. Sometimes there are settings to change the behavior, other times it just will not work.
- Browsers are not really prepared for streaming
The APIs that exist today are not geared toward sending a simple chunk of data and receiving a simple chunk of data. This is sometimes achieved by implementing a new client side applet or something similar, but it is very difficult to support today. Many people revert to the hybrid polling mechanism and send the data in a separate connection.
What needs to happen to better support Comet streaming?
I think there are many things we can do to better support Comet streaming style applications. First, we need to fix the problems above.
HTTP pipelining is broken in many people's eyes since it does not permit out of order chunks across a single connection. With sequence numbers and request identification, a browser, firewall, proxy, or application server could stream more than one Comet streaming request over a single connection.
- Synchronous vs. asynchronous
Our applications and infrastructure need to be updated to handle everything asynchronous. This includes things such as the HTTP Servlet specification, which could learn from the Continuations support in Jetty, or the asynchronous nature of SIP servlets. Most of us expect that the Servlet 3.0 specification will be asynchronous.
- Limited connections
If there is more advanced HTTP pipelining support, then this may not be needed. But, in lieu of that change, systems need to expect a higher number of connections than they do today.
Better APIs are needed to send and receive chunked messages to a Comet streaming style server-side application.
When these issues are addressed, then Comet streaming will be more widely accepted as it helps increase the interactivity of Web-based interfaces. Stock quotes, chat, sports scores, and many other applications will work without pain throughout the infrastructure. Until then, I am afraid that every piece of server software on the Internet will need to be reviewed for whether it can handle this new Comet-streaming based world.
Should we use hybrid polling?
Hybrid polling enables fast -- but not necessarily immediate -- notifications. This methodology is preferable with today's infrastructure for these reasons:
- Hybrid polling is easier on the server infrastructure because there is a break between requests.
The shorter the two intervals are -- the client polling interval and how long the server will hold the connection without responding -- will determine the resource usage that hybrid polling will incur across the infrastructure. Keep the server hold time to a minimum and client poll time to a maximum, if at all possible.
What can be done now?
- Limit your usage of Comet streaming to only those applications that require immediate event notifications.
- Use methodologies, such as hybrid polling, that enable faster event notification without a large toll on the infrastructure. Tune your hybrid polling times to keep connections closed as much as possible.
- Understand the limitations of the new infrastructure you are purchasing. For example, if you are buying a firewall, it helps to know the maximum number of connections it can handle, whether it supports Comet style applications, and whether the function you want (like intrusion detection) requires the buffering of messages.
- Comet: Low Latency Data for Browsers
- Ajax for Java developers: Write scalable Comet applications with Jetty and Direct Web Remoting
- IBM WebSphere Application Server Feature Pack for Web 2.0