memcached is a general-purpose distributed memory caching system that was developed by Danga Interactive and is distributed under the BSD license.
Danga Interactive developed memcached because it desperately needed a memory caching system capable of handling the high-volume traffic on its Web site, LiveJournal.com. More than 20 million page views per day were placing tremendous strain on LiveJournal's databases, so Danga's Brad Fitzpatrick came up with memcached. Not only did memcached reduce the site's database load, it is now the caching solution used by many of the most heavily trafficked Web sites around the world.
In this article, I first provide an overview of memcached, then walk you through the process of installing memcached and building it from source in your development environment. I also introduce the memcached client commands (nine of them in all) and show you how to use them for standard and advanced memcached operations. I conclude with a couple of tricks for using the memcached commands to measure the performance and effectiveness of your cache.
Where does memcached fit into your environment?
Before I get you started with installing and using memcached, let's talk a little about where memcached fits into your environment. While you can use memcached in any number of places, I have found it most useful when I have several long-running queries in my database layer. I will often set up a collection of memcached instances between my database and my application servers and follow a simple pattern of reading and writing to each of these servers. The diagram in Figure 1 should give you an idea of how I set up my application architectures:
Figure 1. Sample application architecture with memcached
The architecture is pretty easy to understand. I have a Web tier, which includes my Apache instances. My next layer is the application itself. This will most often be running on Apache Tomcat or some other open-source application server. The next layer is where I configure my memcached instances — between the application servers and the database servers. When I am using this type of configuration, I have to perform my database reads and writes a little differently.
The sequence I follow when performing reads is to take a request (that requires a database query) from the Web tier and check the cache for previously stored results of that query. If I find the value I am looking for, I return it. If I do not, I perform the query and store the results in the cache prior to returning the results to the Web tier.
When performing writes to the database, I have to first perform the database write and then invalidate any previously cached results that would be affected by this write. This procedure helps to prevent data inconsistencies between the cache and database.
memcached supports several operating systems, including Linux®, Windows®, Mac OS, and Solaris. In this article, I'll walk through the steps required to build and install memcached from source. I like building from source primarily because it enables me to look at the source if I have questions.
The only prerequisite to installing memcached is libevent, the asynchronous event-notification library that memcached depends upon. You will find libevent's source at monkey.org. Go ahead and pick the latest source distribution. For this article, I am using the stable version 1.4.11. Once you have the archive file, extract it to a convenient location and execute the commands in Listing 1:
Listing 1. Building and installing libevent
cd libevent-1.4.11-stable/ ./configure make make install |
Find the memcached source at Danga Interactive and, once again, choose the latest distribution. At the time of this article the latest version is 1.4.0. Extract the tar.gz to a convenient location and execute the commands in Listing 2:
Listing 2. Building and installing memcached
cd memcached-1.4.0/ ./configure make make install |
With those steps complete, you should have a working copy of memcached installed and ready for use. Let's crank it up and start playing around.
To begin using memcached, you need to first start the memcached server and then connect to it using a telnet client.
To start memcached, execute the command in Listing 3:
Listing 3. Starting memcached
./memcached -d -m 2048 -l 10.0.0.40 -p 11211 |
This starts memcached up as a daemon (-d), with 2GB of memory (-m 2048), and listening on localhost, or port 11211. These values will change according to your needs, but they'll serve well for this exercise. Your next step is to connect to memcached. You will connect to the memcached server using a simple telnet client.
Most operating systems have a telnet client built in, but if you are using a Windows-based OS, you will need to download a third-party client. I recommend using PuTTy.
Once you have a telnet client installed, execute the command in Listing 4:
Listing 4. Connecting to memcached
telnet localhost 11211 |
If everything went okay, you should get a telnet response indicating that you are Connected to localhost. If you don't get this response, go back through the previous steps and make sure the source for both libevent and memcached was successfully built.
You are now logged into the memcached server. From this point, you will be able to communicate with memcached through a series of simple commands. The nine memcached client-side commands can be grouped into three classifications:
- Basic
- Advanced
- Management
Basic memcached client commands
You will use the five basic memcached commands for the simplest of operations. These commands and operations are:
setaddreplacegetdelete
The first three commands are your standard modifying commands used to manipulate the key/value pairs stored in memcached. They are all pretty straightforward and share the syntax in Listing 5:
Listing 5. Modifying command syntax
command <key> <flags> <expiration time> <bytes> <value> |
Table 1 defines the parameters for and usage of the memcached modifying commands.
Table 1. memcached modifying command parameters
| Parameter | Usage |
|---|---|
| key | The key used to look up the cached value |
| flags | An integer parameter that can be included with the key/value pair that is used by the client to store additional information about the key/value pair |
| expiration time | The length of time in seconds that the key/value pair should be kept in the cache (0 means forever) |
| bytes | The number of bytes to be stored in the cache |
| value | The value being store (always on the second line) |
Now let's see these commands in action.
set
The set command adds a new key/value pair to the cache. If the key already exists, then the previous value will be replaced.
Note the following interaction, using the set command:
set userId 0 0 5 12345 STORED |
If the key/value pair was set correctly, the server responds with the word STORED. This example added a key/value pair to the cache with a key of userId and a value of 12345. The expiration time was set to 0, which tells memcached that you want this value to stay in the cache until you remove it.
add
The add command adds a new key/value pair to the cache only if the key does not already exist in the cache. If the key already exists, then the previous value will remain the same and you will get the response NOT_STORED.
Here is a standard interaction using the add command:
set userId 0 0 5 12345 STORED add userId 0 0 5 55555 NOT_STORED add companyId 0 0 3 564 STORED |
replace
The replace command replaces a key/value pair in the cache only if that key already exists. If the key is not already in the cache, then you will receive a NOT_STORED response from the memcached server.
Here's a standard interaction using the replace command:
replace accountId 0 0 5 67890 NOT_STORED set accountId 0 0 5 67890 STORED replace accountId 0 0 5 55555 STORED |
The last of the basic commands are get and delete. These commands are pretty obvious and also share a similar
syntax, shown here:
command <key> |
Let's see these commands at work.
get
The get command is used to retrieve the value associated with a previously added key/value pair. You will use get for most data retrieval operations.
Here is a typical interaction using the get command:
set userId 0 0 5 12345 STORED get userId VALUE userId 0 5 12345 END get bob END |
As you can see, the get command is pretty simple. You invoke get with a key and if that key exists in the cache, the value will be returned. If it does not, then nothing will be returned.
delete
The last basic command is delete. The delete command is used to remove any existing values in memcached. You invoke delete with a key and if that key exists in the cache, the value will be deleted. If it does not, then a message of NOT_FOUND will be returned.
Here's a client-server interaction using the delete command:
set userId 0 0 5 98765 STORED delete bob NOT_FOUND delete userId DELETED get userId END |
Advanced memcached client commands
The two advanced commands you can use with memcached are gets and cas. The gets and cas commands are intended to be used together. You will use these commands together to ensure that you are not setting an existing name/value pair to a new value, if the value has already been updated. Let's take a look at each of these commands.
gets
The gets command functions much like the basic get command. The difference between the two commands is that gets returns an extra bit of information: a 64-bit integer that acts much like a "version" identifier of a name/value pair.
Here's a client-server interaction using the gets command:
set userId 0 0 5 12345 STORED get userId VALUE userId 0 5 12345 END gets userId VALUE userId 0 5 4 12345 END |
Consider the differences between the get and gets commands. The gets command returned
an extra value — in this case the integer value 4, which identifies the
name/value pair. If you perform another set on this
name/value pair, the extra value returned by the gets will
change, signifying that the name/value pair has been updated. Listing 6 shows an example:
Listing 6. set updating the version specifier
set userId 0 0 5 33333 STORED gets userId VALUE userId 0 5 5 33333 END |
See the trailing value returned by the gets? It has been updated to 5. This value will change each time you make a change to the name/value pair.
cas
cas (check and set) is a handy memcached command that sets the
value of a name/value pair only if that name/value pair has not been updated since the last time you performed a gets. It uses similar syntax to the set command, but includes one additional value: the extra value returned by the gets.
Note the following interaction using the cas command:
set userId 0 0 5 55555 STORED gets userId VALUE userId 0 5 6 55555 END cas userId 0 0 5 6 33333 STORED |
As you can see, I used the gets command with the additional
integer 6 and the operation worked perfectly. Now take a look at the series of
commands in Listing 7:
Listing 7. The
cas command with an old version specifierset userId 0 0 5 55555 STORED gets userId VALUE userId 0 5 8 55555 END cas userId 0 0 5 6 33333 EXISTS |
Notice that I did not use the most recent integer returned from the gets and the cas failed with a value of EXISTS being returned. In essence, using the gets and cas commands together prevents you from "stepping on" a name/value pair that has been updated since your last read.
The final two memcached commands are used to monitor and clean up an instance of memcached. These are the commands stats and flush_all.
stats
The stats command does exactly what it sounds like it should do: it dumps the current statistics of the instance of memcached you are connected to. An example execution of the stats commands displays the following information about the current memcached instance:
stats STAT pid 63 STAT uptime 101758 STAT time 1248643186 STAT version 1.4.11 STAT pointer_size 32 STAT rusage_user 1.177192 STAT rusage_system 2.365370 STAT curr_items 2 STAT total_items 8 STAT bytes 119 STAT curr_connections 6 STAT total_connections 7 STAT connection_structures 7 STAT cmd_get 12 STAT cmd_set 12 STAT get_hits 12 STAT get_misses 0 STAT evictions 0 STAT bytes_read 471 STAT bytes_written 535 STAT limit_maxbytes 67108864 STAT threads 4 END |
Most of the output here is pretty self-explanatory. I'll also spend more time on the meaning of these values when I discuss cache performance later in the article. For now, take a quick look at the output and then run a few set commands with new keys and then run the stats command again, noticing the changes.
flush_all
flush_all is the final memcached command to be learned. This
simplest of commands just clears all the name/value pairs from the cache. flush_all can be very helpful if you need to reset the cache to a
clean state. Here's an example of using flush_all:
set userId 0 0 5 55555 STORED get userId VALUE userId 0 5 55555 END flush_all OK get userId END |
I'll conclude this article with a lesson in using the advanced memcached commands to find out how your cache is performing. The stats command is invaluable for tuning your cache usage. Two of the most important statistics to keep an eye on are get_hits and get_misses. These values tell you how many times a name/value pair is found (get_hits) versus how many times the name/value pair is not found (get_misses).
The combination of these values can indicate how effectively you are utilizing your cache. When you first start up the cache, it is natural to see the get_misses go up, but after certain amount of usage, these number of get_misses should level off — indicating the cache is primed with the most common reads. If you see the get_misses continue to rise quickly and the get_hits level off, then you need to take a look at what you are caching. You may be caching the wrong things.
Another method of determining your caching effectiveness is to take a look at your
cache hit ratio. The cache hit ratio tells you the percentage of times you are performing a get versus the number of times that get misses. To determine this percentage, go ahead and run the stats command again, as shown in Listing 8:
Listing 8. Calculating the cache hit ratio
stats STAT pid 6825 STAT uptime 540692 STAT time 1249252262 STAT version 1.2.6 STAT pointer_size 32 STAT rusage_user 0.056003 STAT rusage_system 0.180011 STAT curr_items 595 STAT total_items 961 STAT bytes 4587415 STAT curr_connections 3 STAT total_connections 22 STAT connection_structures 4 STAT cmd_get 2688 STAT cmd_set 961 STAT get_hits 1908 STAT get_misses 780 STAT evictions 0 STAT bytes_read 5770762 STAT bytes_written 7421373 STAT limit_maxbytes 536870912 STAT threads 1 END |
Now take the number of get_hits and divide it by the cmd_gets. In this example, you are at a ratio of roughly 71 percent. Ideally, you would like this percentage to be a lot higher — the higher the ratio the better. Watching your stats and measuring them over time will give you a very good indication of the effectiveness of your caching strategies.
Caching is an essential part of any high-volume Web application and memcached is a great caching option. I have personally had a ton of success using it. If you choose to leverage memcached as your caching solution, I am sure you will see just how effective it is.
In the second part of this article, you will learn how to integrate memcached into a Grails application. This will be an opportunity to explore an exciting, viable stack for scalable Web application development and also do some really cool stuff. Until then, what you've learned in this article is a great starting point for doing more with memcached. I encourage you to install your own instance of memcached and start playing around with it.
Learn
- "Distributed Caching with Memcached" (Brad Fitzpatrick, Linux Journal, August 2004): Danga Interactive's Brad Fitzpatrick introduces memcached.
- "Server load-balancing architectures: Transport-level architectures" (Gregor Roth, JavaWorld, October 2008): Introduces a memcached load-balancing solution based on caching
HttpResponsemessages across multiple machines. - "Performance tuning considerations in your application server environment" (Sean Walberg, developerWorks, January 2009): An overview of how the various components of a Web application interact, including common performance bottlenecks and solutions such as caching.
- "Is Memcached a Good or Bad Sign for MySQL?" (Gary Orenstein, Gigaom.com, May 2009): An overview of where memcached fits into the application stack and how lightweight caching alternatives are challenging the RDBMS.
-
developerWorks Java technology zone: Find hundreds of articles about every aspect of Java programming.
Get products and technologies
- Download memcached: A distributed memory caching system.
-
Download libevent: An asynchronous event-notification library.
- Download PuTTy: A Windows telnet client.
Discuss
-
Get involved in the My developerWorks community.

James Goodwill is a well-known author and technologist living in the Rocky Mountain region of the United States. His main focus is scaling Java and Grails Web applications. His published titles include Developing Java Servlets, Mastering Jakarta Struts, and Mastering JSP Custom Tags and Tag Libraries.




