When someone mentions optimizing cloud application delivery, I immediately think of the three features that are associated with an application delivery controller (ADC) — acceleration, high availability (HA), and intelligent control. In this article, I propose a "traffic management" model that will emphasize the acceleration and high availability aspects.
- Acceleration: Understand the different load balancing algorithm options and content caching. Get the insight into why IP transparency can have a negative impact on the acceleration of web applications.
- HA: Learn how to achieve it through clustering and how to gracefully handle faults within your application infrastructure.
Application delivery controller (ADC) is a device or technology that, as just mentioned, is concerned with acceleration, HA, and intelligent control. To gain a basic understanding of what an ADC actually does, it helps to position it against commonly known networking technologies:
- A network switch is responsible for writing frames of Ethernet data to a LAN segment.
- A router is responsible for sending internet traffic to a specific IP address.
- A NAT/firewall is used to implement policies that affect TCP or UDP-layer traffic.
It is a necessarily vague description, but as the name suggests an ADC is used to control and manage the way that application traffic is delivered. It typically lives between a firewall and a number of application servers.
Now let's talk about the acceleration aspect of optimization.
The acceleration factor
In a strong managed traffic cloud system, a key aspect of application acceleration is the load balancing algorithm you use to distribute traffic. In the case of web applications, HTTP multiplexing can provide an enormous speed boost when used in conjunction with certain web servers.
There are also a number of factors that can negatively impact the effects of this acceleration; IP transparency for example, completely negates the positive benefits of HTTP multiplexing.
Content caching is an extremely effective way of reducing load times while taking pressure off your web servers.
To handle the acceleration issue, this article covers:
- The pros and cons of different load balancing algorithms.
- Locality-aware request distribution.
- Web server scalability and HTTP multiplexing.
- IP transparency.
- HTTP caching.
Let's discuss each of these ideas in more detail.
A well-chosen load balancing algorithm
There is a fairly wide variety of load balancing algorithms available. Round-robin is probably the most well-known and best understood method of load balancing since its operation is the most simple. Other, more complicated algorithms need a bit more explanation in order to understand. For instance, consider an algorithm that sends more requests to a server that has a faster response time than the others or to the server with the least number of connections.
To show you what the options are, Table 1 is a breakdown of some common load balancing algorithms:
Table 1. Common load balancing algorithms
|Round-robin||Requests are distributed to each application server in turn. Simple, deterministic, usually the default choice.|
|Weighted round-robin||Same as round-robin, but with manual weighting added to prevent overloading on machines with relatively less processing power than others.|
|Random node||Chooses an application server at random.|
|Least connections||Tends to forward requests to application servers that have fewer concurrent connections than others.|
|Weighted least connections||Same as least connections, but with some manual weighting added to prevent overloading on machines with relatively less processing power.|
|Fastest response time||Tends to forward requests to application servers that have a better response time than others; this is usually a good indication that a machine can handle more traffic than others.|
|Perceptive||A combination of least connections and fastest response time, but also incorporates logic to gradually increase the volume of traffic sent to newly added or recently recovered application servers.|
The least connections algorithm is usually better for traffic where each request tends to have the same impact on the application server. If all application servers do not possess equal amounts of compute power, a weighted least connections algorithm can be used to account for the difference.
The least connections algorithm is not appropriate for traffic where any two requests can have a significantly different impact on the application server (such as HTTP). Fastest response time is usually better suited for web applications.
Locality-aware request distribution
Depending on the exact ADC that you're using, more sophisticated load balancing algorithms may incorporate a feature called locality-aware request distribution (LARD). A web server's underlying operating system will cache files that were recently read from disk in physical memory. A web server that is storing an object in memory will likely respond to a request faster than a web server that needs to read the file from disk. LARD increases the overall performance of the system by introducing a subtle tenancy to forward requests for the same URLs to the same servers.
Web server scalability and HTTP multiplexing
Web servers that have been designed so that each concurrent connection requires its own process are not scalable for real-world applications; at least, not without help.
In such a system, the overhead that it takes for a web server's underlying kernel to switch between each process (to see if they have any work for the CPU to do) becomes a bottleneck. This is why (for example) the Apache web server places limits on the number of concurrent processes that it will spawn (both soft limits and hard limits).
This problem only tends to manifest in real-world conditions in which client connections:
- are many,
- are somewhat lossy, and
- where they exhibit varying degrees of latency.
You may find yourself bitterly disappointed when you only get a tenth of the requests-per-second (or worse) in production than you measured when performing benchmarks in your lab environment.
To rise to the scalability challenge, you can employ a technique called HTTP multiplexing. An ADC that is designed so that many connections are handled by a single process do not have the same scalability issues as the Apache server. By terminating and buffering a large number of concurrent connections on the client side, the ADC can then aggregate HTTP requests through a smaller number of persistent connections that it keeps open to the web server. Because the ADC buffers the client-side connections before it forwards them to the application server, the request on the server-side completes in a very short period of time, assuming that the network connecting the ADC to the application servers is decent.
When using this technique, you shouldn't be surprised to realize real-world performance gains of 2000 percent on busy sites, even with just a single application server.
Now you want to note that this technique requires that the ADC operate in layer 7 mode because it must be able to determine whether or not the clients and the servers both support HTTP keep-alive connections (which, requires parsing the headers). A layer 4 load balancer does not usually benefit from this method; indeed, it may actually reduce performance!
Many operators find themselves wanting (for one reason or another) to record the origin IP address of their users in their web servers' access logs. To do this, they instinctively configure the ADC to operate in transparent mode.
In transparent mode, an ADC uses the IP address of each client as the SRC IP address in each packet that it sends to the application server. This process is sometimes referred to as spoofing.
While this is a perfectly logical course of action, turning on IP transparency can carry a performance penalty. With transparency enabled, you now have a situation where there is a 1:1 ratio of client-to-ADC and ADC-to-server connections, instead of an n:1 ratio. This makes HTTP multiplexing completely ineffective.
Before you insist on using transparency, seriously consider other ways to get that client IP information. You can, for example, configure the ADC to insert an HTTP cookie into each request. Or you might be able to perform your access logging on the ADC itself.
One of the most effective and frequently-used methods of speeding up a web application is to drop a caching reverse proxy in front of it. The most important part of performing HTTP caching is determining which objects can be stored in the cache, when they are to expire, etc. A caching device needs to parse all of the request and response headers in order to find answers to those questions.
An ADC is ideally poised to perform content caching because it is usually parsing the headers anyway — assuming that it is configured to operate in layer 7 mode rather than layer 4 mode.
The high availability issue
Different cloud providers have different solutions for high availability. Some opt to transition one running instance from one physical resource over to another should something go wrong; others simply transition the IP addresses from one instance to another. Just moving the IP address from one place to another is always going to be the faster option.
ADCs have some method of IP failover built into them. It usually involves an ARP broadcast (address resolution protocol) that informs the network infrastructure that an IP address has moved. You may also have the option of distributing single IP addresses across several ADC instances using multicast MAC addressing (media access control).
IP addresses that can move around or otherwise be shared by members of a cluster are often referred to as virtual IP addresses. In order for an Internet service to be considered HA, it must be configured to listen in on at least one of these virtual IP addresses.
For more detail on high availability, I'm going to cover:
- ADC fault tolerance
- Application health monitoring
Good practice dictates that you should never introduce a single point of failure into a critical infrastructure. All ADCs have some kind of clustering mechanism. In a clustered deployment, a group of ADCs share configuration and have the ability to share some information about the state of the application sessions that they are both handling. Depending on the product you're using, the configuration management can be centralized or distributed. Some ADC configuration systems follow a master-slave model while others follow a multi-master model.
ADC fault tolerance
Clustered ADCs use some kind of heartbeat mechanism to detect failures and a failover mechanism to achieve HA. In the event that one of the clustered ADCs fails, the traffic workload that it was handling should get divided up among the remaining healthy members of the cluster.
It might also be desirable for services to automatically fail back to the failed machine (after it recovers, of course).
Application health monitoring
There are certain things that you would expect to happen should one of your web servers fail; at the very least, you want to avoid attempting to send it any new traffic.
Incorrect expectations can lead to problems; to avoid that, here are some things to consider when it comes to fault tolerance:
- Are you monitoring the actual traffic that is passing through the delivery system?
- Are you performing out-of-band tests against the application?
- How often are you checking?
- How thoroughly and to what extend are you testing the application?
- How many consecutive failures should result in a failure event?
- How are faults detected?
- What automatically happens when a fault is detected (if anything)?
- How are notifications sent?
These considerations should be reviewed in detail because they affect the time that it takes to detect a failure. Failures can never be detected instantly. You need to find a compromise between failover speed and over-sensitivity. If your health monitoring is too sensitive, the smallest blip in the network might set off alarms.
The next section uses a real-world system I'm involved with to showcase some of the previous concepts.
Example of a real-world traffic manager
Finally, to illustrate the concepts in this article, I'm going to showcase the following traffic-management techniques with Zeus Technology's Traffic Manager:
- High availability through clustering.
- Traffic IP address groups allow traffic managers to control IP addresses in a single-hosted or multi-hosted mode.
- How easy it is to deploy new services.
- How easy it is to change a load balancing algorithm.
- How easy it is to use content caching.
High availability through clustering
Zeus' management console follows a distributed model in which each traffic manager runs an admin process. This admin process includes a web-based user interface. Any number of operators can be logged in to different members of the cluster at a time and can be making configuration changes. All configuration changes are pushed out to the other cluster members when they are submitted.
Everything will run smoothly as long as none of those changes conflict with each other. If they do, the operators are prompted to resolve the conflict.
Assuming that you have a pair of machines ready to cluster, you can run the cluster-joining wizard:
- Select Join a Cluster from the wizard's drop-down box in the top-right corner of the Zeus user interface (UI). Click Next as needed.
- The traffic manager sends out a cluster detection message to the broadcast domains for each interface on the machine. Any other traffic managers within those broadcast domains send a response. All responding machines appear in the list of joinable clusters.
- Select the established cluster host from the list of clusters and click Next.
- Enter the credentials for an administrative user on the established cluster. At this point, you also have the opportunity to tell the joining cluster member that it should either start handling traffic right away or wait for you to configure it to do so later. Unless you have configured at least one traffic IP address group, changing this option has no effect.
- A short summary of what you're about to do is displayed. Click Finish when you're ready to proceed.
Now that you have a working cluster, you can create Traffic IP address groups (and have them failover if one of the machines in the cluster fails). Traffic IP address groups are used to make the services that you're going to configure highly available. It's worth noting that from a technical point of view, you don't need to create a cluster before creating a traffic IP group or before you create a service; I've just listed the steps in a workflow that makes sense for the purpose of the article.
Traffic IP address groups
At the heart of Zeus Traffic Manager's high availability features is the traffic IP address (which is Zeus' terminology for a VIP, or virtual IP address). Organized into logical units called traffic IP address groups (TIP group), traffic IP addresses are hosted by clusters of Zeus Traffic Managers. There is no upper limit to the number of traffic managers that can be joined in a cluster or to the number of traffic IP addresses that a cluster can host. A single traffic IP address can be hosted by many Traffic Managers in an Active-to-the-Nth-degree configuration.
Zeus Traffic Manager is typically deployed in an active-active pair. As the demand for resource increases on each active traffic manager, new machines can be added to the cluster as needed.
A single traffic manager may host many traffic IP groups that can be controlled individually. Assuming that there is at least one functioning traffic manager in a cluster, all traffic IP addresses remain reachable. Traffic IP Groups can be operated in either a single-hosted or multi-hosted mode.
In single-hosted mode, each traffic IP address is hosted on one of the traffic managers in the cluster. If there are multiple IP addresses in the group, they are hosted on different traffic manager and distributed as evenly as possible.
For example, if you have a cluster of two Zeus machines and a TIP group with three IP addresses in it, one machine would host two addresses and the other would host one.
In multi-hosted mode, each traffic IP address can be hosted on all of the traffic managers in the cluster simultaneously. To deploy traffic IP addresses in multi-hosted mode, you need to either be running the ZTM or ZLB virtual appliance or you need to have installed the Zeus Kernel Modules for Linux® which can be downloaded from the Zeus Community site. If the correct module is running, you will see the option to raise each address on every machine in the group (Figure 1).
Figure 1. Multi-hosted mode lets you raise each address on every machine in the group
At this point, you should have a cluster of traffic managers, and at least one Traffic IP group. If you stick a packet sniffer on the network, you will notice that each traffic manager has started to announce its health using IGMP/Multicast heartbeat messages. By default, these are sent every 500ms. In the event that one traffic manager's heartbeats are not seen by the others for 5 seconds, the rest of the cluster assumes that it has failed and initiates a failover. Now that you've got a traffic IP group, any services that are running on the cluster can me made highly available.
Easy to deploy new services
Deploying new services with Zeus Traffic Manager is remarkably civilized. As with clustering, there is a wizard that guides you through the creation of a new service.
Suppose you want to create a new service called "Web" that load balances HTTP to a number of web servers:
- Select Wizards > Manage a new service from the drop-down list in the top right corner of the UI.
- Create a new service called Web.
- Set the internal protocol to HTTP.
- It should listen on port 80.
- Click Next.
- One at a time, enter the host names (or IP addresses) of the application servers, then click Add Node. Note that the default port field is the same as the one specified in Step 2 of the wizard.
- Click the Next button when you're done adding the application servers.
- Check the summary page over before completing the wizard by clicking Finish.
Once you complete the wizard, you should see a virtual server appear in the Services section of the home page (Figure 2):
Figure 2. There's your virtual server
By default, services that you create using the manage a new service wizard listen on every IP address available on each cluster member including traffic IP groups — meaning that they are basically highly available out of the box. This is fine if you're only running one service. But, the chances are if you're using a traffic manager that you've got more than one. To configure a service to listen on specific traffic IP address groups:
- Navigate to Services > Virtual Servers > Web.
- Click the radio button next to Listen on specific Traffic IP address groups.
- Check the box corresponding to each Traffic IP group that you want the service to listen on.
- Click Update.
Easy to change a pool's load balancing algorithm
It's also pretty easy to change a pool's load balancing algorithm:
- In the Zeus UI, click Services.
- Click the Pools tab.
- Click the Edit link for the pool you want to edit.
- Click the Load Balancing Edit link.
- Select the desired algorithm from the load_balancing!algorithm list.
- Optionally, if you're using one of the weighted algorithms, fill in the weighting fields.
- Click Update.
Figure 3. Round-robin's at the top!
Zeus Traffic Manager's HTTP multiplexer is an integral part of its load-balancing engine (for HTTP services, anyway) and it can't be turned off (there just isn't any use-case for turning it off). If you want to benefit from LARD and you're managing a web application, try either the fastest response time or the perceptive algorithm.
Easy to use content caching
To turn on the traffic manager's cache feature, take the following steps:
- Click Services.
- Click the Virtual Servers tab.
- Click the Edit link for the virtual server of your choice.
- Click the Content Caching link.
- Set the webcache!enabled: option to "Yes."
- Scroll down to the bottom of the page and click Update.
With the content caching feature turned on, the traffic manager now acts like a caching proxy. It intelligently adheres to all policies relating to caching proxies as per RFC 2616. You should observe a dramatic drop in system load on your web servers and a reduction in load/response time from your web browser. Best of all, there is no need to add extra complexity to the infrastructure by installing separate caching devices!
The features and methods discussed in this article really only scratch the surface of what you can accomplish with an ADC. The clever ways that certain ADCs are designed to handle connections (beyond simply performing load balancing) are fundamental to application acceleration, so it's important that you have a reasonable level of understanding of how they work under the hood. Content caching is a relatively cheap and simple way to make your website load times a lot more snappy. I haven't covered any sort of traffic valuation or prioritization in this article, so that would be a good area to spend some time researching. Investigate topics such as:
- Network-side scripting
- Bandwidth management
- Request rate shaping
Zeus Technology has a great community site, on which they freely publish all of their official product documentation. If you're interested to know more about application delivery, check out the Zeus community.
- In the developerWorks cloud developer resources, discover and share knowledge and experience of application and services developers building their projects for cloud deployment.
- The next steps: Find out how to access IBM SmartCloud Enterprise.
Get products and technologies
- See the product images available for IBM SmartCloud Enterprise.
- Join a cloud computing group on developerWorks.
- Read all the great cloud blogs on developerWorks.
- Join the developerWorks community, a professional network and unified set of community tools for connecting, sharing, and collaborating.