memcached is a general-purpose distributed memory caching system that was developed by Danga Interactive and is distributed under the BSD license.
Danga Interactive developed memcached because it desperately needed a memory caching system capable of handling the high-volume traffic on its Web site, LiveJournal.com. More than 20 million page views per day were placing tremendous strain on LiveJournal's databases, so Danga's Brad Fitzpatrick came up with memcached. Not only did memcached reduce the site's database load, it is now the caching solution used by many of the most heavily trafficked Web sites around the world.
In this article, I first provide an overview of memcached, then walk you through the process of installing memcached and building it from source in your development environment. I also introduce the memcached client commands (nine of them in all) and show you how to use them for standard and advanced memcached operations. I conclude with a couple of tricks for using the memcached commands to measure the performance and effectiveness of your cache.
Where does memcached fit into your environment?
Before I get you started with installing and using memcached, let's talk a little about where memcached fits into your environment. While you can use memcached in any number of places, I have found it most useful when I have several long-running queries in my database layer. I will often set up a collection of memcached instances between my database and my application servers and follow a simple pattern of reading and writing to each of these servers. The diagram in Figure 1 should give you an idea of how I set up my application architectures:
Figure 1. Sample application architecture with memcached
The architecture is pretty easy to understand. I have a Web tier, which includes my Apache instances. My next layer is the application itself. This will most often be running on Apache Tomcat or some other open-source application server. The next layer is where I configure my memcached instances — between the application servers and the database servers. When I am using this type of configuration, I have to perform my database reads and writes a little differently.
The sequence I follow when performing reads is to take a request (that requires a database query) from the Web tier and check the cache for previously stored results of that query. If I find the value I am looking for, I return it. If I do not, I perform the query and store the results in the cache prior to returning the results to the Web tier.
When performing writes to the database, I have to first perform the database write and then invalidate any previously cached results that would be affected by this write. This procedure helps to prevent data inconsistencies between the cache and database.
memcached supports several operating systems, including Linux®, Windows®, Mac OS, and Solaris. In this article, I'll walk through the steps required to build and install memcached from source. I like building from source primarily because it enables me to look at the source if I have questions.
The only prerequisite to installing memcached is libevent, the asynchronous event-notification library that memcached depends upon. You will find libevent's source at monkey.org. Go ahead and pick the latest source distribution. For this article, I am using the stable version 1.4.11. Once you have the archive file, extract it to a convenient location and execute the commands in Listing 1:
Listing 1. Building and installing libevent
cd libevent-1.4.11-stable/ ./configure make make install
Find the memcached source at Danga Interactive and, once again, choose the latest distribution. At the time of this article the latest version is 1.4.0. Extract the tar.gz to a convenient location and execute the commands in Listing 2:
Listing 2. Building and installing memcached
cd memcached-1.4.0/ ./configure make make install
With those steps complete, you should have a working copy of memcached installed and ready for use. Let's crank it up and start playing around.
To begin using memcached, you need to first start the memcached server and then connect to it using a telnet client.
To start memcached, execute the command in Listing 3:
Listing 3. Starting memcached
./memcached -d -m 2048 -l 10.0.0.40 -p 11211
This starts memcached up as a daemon (
-d), with 2GB of memory (
-m 2048), and listening on localhost, or port 11211. These values will change according to your needs, but they'll serve well for this exercise. Your next step is to connect to memcached. You will connect to the memcached server using a simple telnet client.
Most operating systems have a telnet client built in, but if you are using a Windows-based OS, you will need to download a third-party client. I recommend using PuTTy.
Once you have a telnet client installed, execute the command in Listing 4:
Listing 4. Connecting to memcached
telnet localhost 11211
If everything went okay, you should get a telnet response indicating that you are Connected to localhost. If you don't get this response, go back through the previous steps and make sure the source for both libevent and memcached was successfully built.
You are now logged into the memcached server. From this point, you will be able to communicate with memcached through a series of simple commands. The nine memcached client-side commands can be grouped into three classifications:
Basic memcached client commands
You will use the five basic memcached commands for the simplest of operations. These commands and operations are:
The first three commands are your standard modifying commands used to manipulate the key/value pairs stored in memcached. They are all pretty straightforward and share the syntax in Listing 5:
Listing 5. Modifying command syntax
command <key> <flags> <expiration time> <bytes> <value>
Table 1 defines the parameters for and usage of the memcached modifying commands.
Table 1. memcached modifying command parameters
|key||The key used to look up the cached value|
|flags||An integer parameter that can be included with the key/value pair that is used by the client to store additional information about the key/value pair|
|expiration time||The length of time in seconds that the key/value pair should be kept in the cache (0 means forever)|
|bytes||The number of bytes to be stored in the cache|
|value||The value being store (always on the second line)|
Now let's see these commands in action.
set command adds a new key/value pair to the cache. If the key already exists, then the previous value will be replaced.
Note the following interaction, using the
set userId 0 0 5 12345 STORED
If the key/value pair was
set correctly, the server responds with the word STORED. This example added a key/value pair to the cache with a key of
userId and a value of
12345. The expiration time was set to 0, which tells memcached that you want this value to stay in the cache until you remove it.
add command adds a new key/value pair to the cache only if the key does not already exist in the cache. If the key already exists, then the previous value will remain the same and you will get the response NOT_STORED.
Here is a standard interaction using the
set userId 0 0 5 12345 STORED add userId 0 0 5 55555 NOT_STORED add companyId 0 0 3 564 STORED
replace command replaces a key/value pair in the cache only if that key already exists. If the key is not already in the cache, then you will receive a NOT_STORED response from the memcached server.
Here's a standard interaction using the
replace accountId 0 0 5 67890 NOT_STORED set accountId 0 0 5 67890 STORED replace accountId 0 0 5 55555 STORED
The last of the basic commands are
delete. These commands are pretty obvious and also share a similar
syntax, shown here:
Let's see these commands at work.
get command is used to retrieve the value associated with a previously added key/value pair. You will use
get for most data retrieval operations.
Here is a typical interaction using the
set userId 0 0 5 12345 STORED get userId VALUE userId 0 5 12345 END get bob END
As you can see, the
get command is pretty simple. You invoke
get with a key and if that key exists in the cache, the value will be returned. If it does not, then nothing will be returned.
The last basic command is
delete command is used to remove any existing values in memcached. You invoke
delete with a key and if that key exists in the cache, the value will be deleted. If it does not, then a message of NOT_FOUND will be returned.
Here's a client-server interaction using the
set userId 0 0 5 98765 STORED delete bob NOT_FOUND delete userId DELETED get userId END
Advanced memcached client commands
The two advanced commands you can use with memcached are
cas commands are intended to be used together. You will use these commands together to ensure that you are not setting an existing name/value pair to a new value, if the value has already been updated. Let's take a look at each of these commands.
gets command functions much like the basic
get command. The difference between the two commands is that
gets returns an extra bit of information: a 64-bit integer that acts much like a "version" identifier of a name/value pair.
Here's a client-server interaction using the
set userId 0 0 5 12345 STORED get userId VALUE userId 0 5 12345 END gets userId VALUE userId 0 5 4 12345 END
Consider the differences between the
gets commands. The
gets command returned
an extra value — in this case the integer value 4, which identifies the
name/value pair. If you perform another
set on this
name/value pair, the extra value returned by the
change, signifying that the name/value pair has been updated. Listing 6 shows an example:
Listing 6. set updating the version specifier
set userId 0 0 5 33333 STORED gets userId VALUE userId 0 5 5 33333 END
See the trailing value returned by the
gets? It has been updated to 5. This value will change each time you make a change to the name/value pair.
cas (check and set) is a handy memcached command that sets the
value of a name/value pair only if that name/value pair has not been updated since the last time you performed a
gets. It uses similar syntax to the
set command, but includes one additional value: the extra value returned by the
Note the following interaction using the
set userId 0 0 5 55555 STORED gets userId VALUE userId 0 5 6 55555 END cas userId 0 0 5 6 33333 STORED
As you can see, I used the
gets command with the additional
integer 6 and the operation worked perfectly. Now take a look at the series of
commands in Listing 7:
Listing 7. The
cas command with an old version specifier
set userId 0 0 5 55555 STORED gets userId VALUE userId 0 5 8 55555 END cas userId 0 0 5 6 33333 EXISTS
Notice that I did not use the most recent integer returned from the
gets and the
cas failed with a value of EXISTS being returned. In essence, using the
cas commands together prevents you from "stepping on" a name/value pair that has been updated since your last read.
Cache management commands
The final two memcached commands are used to monitor and clean up an instance of memcached. These are the commands
stats command does exactly what it sounds like it should do: it dumps the current statistics of the instance of memcached you are connected to. An example execution of the
stats commands displays the following information about the current memcached instance:
stats STAT pid 63 STAT uptime 101758 STAT time 1248643186 STAT version 1.4.11 STAT pointer_size 32 STAT rusage_user 1.177192 STAT rusage_system 2.365370 STAT curr_items 2 STAT total_items 8 STAT bytes 119 STAT curr_connections 6 STAT total_connections 7 STAT connection_structures 7 STAT cmd_get 12 STAT cmd_set 12 STAT get_hits 12 STAT get_misses 0 STAT evictions 0 STAT bytes_read 471 STAT bytes_written 535 STAT limit_maxbytes 67108864 STAT threads 4 END
Most of the output here is pretty self-explanatory. I'll also spend more time on the meaning of these values when I discuss cache performance later in the article. For now, take a quick look at the output and then run a few
set commands with new keys and then run the
stats command again, noticing the changes.
flush_all is the final memcached command to be learned. This
simplest of commands just clears all the name/value pairs from the cache.
flush_all can be very helpful if you need to reset the cache to a
clean state. Here's an example of using
set userId 0 0 5 55555 STORED get userId VALUE userId 0 5 55555 END flush_all OK get userId END
I'll conclude this article with a lesson in using the advanced memcached commands to find out how your cache is performing. The
stats command is invaluable for tuning your cache usage. Two of the most important statistics to keep an eye on are get_hits and get_misses. These values tell you how many times a name/value pair is found (get_hits) versus how many times the name/value pair is not found (get_misses).
The combination of these values can indicate how effectively you are utilizing your cache. When you first start up the cache, it is natural to see the get_misses go up, but after certain amount of usage, these number of get_misses should level off — indicating the cache is primed with the most common reads. If you see the get_misses continue to rise quickly and the get_hits level off, then you need to take a look at what you are caching. You may be caching the wrong things.
Another method of determining your caching effectiveness is to take a look at your
cache hit ratio. The cache hit ratio tells you the percentage of times you are performing a
get versus the number of times that
get misses. To determine this percentage, go ahead and run the
stats command again, as shown in Listing 8:
Listing 8. Calculating the cache hit ratio
stats STAT pid 6825 STAT uptime 540692 STAT time 1249252262 STAT version 1.2.6 STAT pointer_size 32 STAT rusage_user 0.056003 STAT rusage_system 0.180011 STAT curr_items 595 STAT total_items 961 STAT bytes 4587415 STAT curr_connections 3 STAT total_connections 22 STAT connection_structures 4 STAT cmd_get 2688 STAT cmd_set 961 STAT get_hits 1908 STAT get_misses 780 STAT evictions 0 STAT bytes_read 5770762 STAT bytes_written 7421373 STAT limit_maxbytes 536870912 STAT threads 1 END
Now take the number of get_hits and divide it by the cmd_gets. In this example, you are at a ratio of roughly 71 percent. Ideally, you would like this percentage to be a lot higher — the higher the ratio the better. Watching your stats and measuring them over time will give you a very good indication of the effectiveness of your caching strategies.
Conclusion to Part 1
Caching is an essential part of any high-volume Web application and memcached is a great caching option. I have personally had a ton of success using it. If you choose to leverage memcached as your caching solution, I am sure you will see just how effective it is.
In the second part of this article, you will learn how to integrate memcached into a Grails application. This will be an opportunity to explore an exciting, viable stack for scalable Web application development and also do some really cool stuff. Until then, what you've learned in this article is a great starting point for doing more with memcached. I encourage you to install your own instance of memcached and start playing around with it.
- "Distributed Caching with Memcached" (Brad Fitzpatrick, Linux Journal, August 2004): Danga Interactive's Brad Fitzpatrick introduces memcached.
- "Server load-balancing architectures: Transport-level architectures" (Gregor Roth, JavaWorld, October 2008): Introduces a memcached load-balancing solution based on caching
HttpResponsemessages across multiple machines.
- "Performance tuning considerations in your application server environment" (Sean Walberg, developerWorks, January 2009): An overview of how the various components of a Web application interact, including common performance bottlenecks and solutions such as caching.
- "Is Memcached a Good or Bad Sign for MySQL?" (Gary Orenstein, Gigaom.com, May 2009): An overview of where memcached fits into the application stack and how lightweight caching alternatives are challenging the RDBMS.
- developerWorks Java technology zone: Find hundreds of articles about every aspect of Java programming.
Get products and technologies
- Download memcached: A distributed memory caching system.
- Download libevent: An asynchronous event-notification library.
- Download PuTTy: A Windows telnet client.
- Get involved in the My developerWorks community.