Contents


Java development 2.0

Redis for the real world

How Redis beats memcached in heavily read applications

Comments

Content series:

This content is part # of # in the series: Java development 2.0

Stay tuned for additional content in this series.

This content is part of the series:Java development 2.0

Stay tuned for additional content in this series.

I've discussed the concept of NoSQL before in this series and introduced a variety of NoSQL data stores that are compatible with the Java platform, including Google's Bigtable and Amazon’s SimpleDB. I've also discussed more conventional server-based data stores like MongoDB and CouchDB. Every data store has strengths and weaknesses, especially as applied to a particular domain scenario.

This month's Java development 2.0 spotlight is on Redis, a lightweight key-value data store. Most NoSQL implementations are essentially key-value, but Redis supports an unusually rich set of values, including strings, lists, sets, and hashes. As such, Redis is often labeled a data structure server. Redis also has a reputation for being exceptionally fast, which makes it an optimum choice for a certain class of use cases.

When trying to understand something new, it can be helpful to compare it to something you are already familiar with, so we'll start our exploration of Redis by considering its similarity to memcached. I'll then demonstrate key features of Redis that could give it an edge over memcached in some application scenarios. And finally, I'll show you how to use Redis as a traditional datastore for model objects.

Redis and memcached

Memcached is a well-known, in-memory object caching system that works by putting a target key and value into a memory cache. Memcached thus sidesteps the I/O cost that happens when a read hits the disk. Sticking memcached between a web application and a database can yield better read performance. Memcached is, therefore, a good choice for applications that require fast data look ups. One example would be a stock look-up service that would otherwise hit a database for fairly static data, such as ticker-to-name or even pricing information.

But memcached has some limitations, including the fact that all its values are simple strings. Redis, as an alternative to memcached, supports a richer feature set. Some benchmarks also indicate that Redis is much faster than memcached. Redis's rich data types make it possible to store far more sophisticated data in memory than you could with memcached. And unlike memcached, Redis can persist its data.

Redis makes a great caching solution, but its rich feature set leads to other uses. Because Redis is capable of storing data on disk and replicating data across nodes, it can be leveraged as a data repository for traditional data models (that is, you can use Redis much like you would an RDBMS). Redis is also often employed as a queuing system. In this use case, Redis is the basis of a backing, persistent store of work queues that leverage Redis's list type. GitHub is one example of a large-scale infrastructure that uses Redis this way.

Get Redis and go!

In order to get started with Redis, you'll have to get access to it, which you can do via a local install or a hosted provider. If you're on a Mac, the install process couldn't be easier. If you're using Windows®, you'll need to have Cygwin installed. Regardless of how you access Redis, you will be able to follow the examples later in the article. I should point out, though, that using a hosted Redis provider for caching might not be a great caching solution, because network latency could undo any performance gains.

You interact with Redis via commands, meaning that there is no SQL-like query language. Working with Redis is very much like working with a traditional map data structure — everything has a key and a value, and each value has a rich set of data types associated with it. Every data type also has its own set of commands. For instance, if you planned on using simple data types, say in some sort of caching scheme, you could use the commands set and get.

You can interact with an instance of Redis via a command-line shell. There also are multiple client implementations for programmatically working with Redis. Listing 1 shows a simple command-line shell interaction using basic commands:

Listing 1. Using basic Redis commands
redis 127.0.0.1:6379> set page registration
OK
redis 127.0.0.1:6379> keys *
1) "foo"
2) "page"
redis 127.0.0.1:6379> get page
"registration"

Here, I've associated the key "page" with the value "registration" via the set command. Next, I've issued the keys command (the trailing * signifies that I want to see all instance keys available). The keys command shows that there is a page key as well as a foo one — I can retrieve the value associated with a key via the get command. Keep in mind that the value retrieved from a get can only be a string. If a key's value is a list, for example, you must use a list-specific command to retrieve the list's elements. (Note that there are commands to query a value's type.)

Java integration with Jedis

For programmers wanting to integrate Redis into Java applications, the Redis team recommends a project called Jedis. Jedis is a lightweight library that maps native Redis commands to simple Java methods. For instance, Jedis lets me get and set simple values like in Listing 2:

Listing 2. Basic Redis commands in Java code
JedisPool pool = new JedisPool(new JedisPoolConfig(), "localhost");
Jedis jedis = pool.getResource();

jedis.set("foo", "bar");
String foobar = jedis.get("foo");
assert foobar.equals("bar");

pool.returnResource(jedis);
pool.destroy();

In Listing 2, I configure a connection pool and grab a connection, (much like you would in a typical JDBC scenario), which I then return at the bottom of the listing. Between the connection-pool logic, I set the value "bar" with the key "foo", which I retrieve via the get command.

Similar to memcached, Redis allows you to associate an expiration time to a value. So I can set a value (say a stock's temporary trading price) that eventually will be purged from the Redis cache. If I want to set an expiration time in Jedis, I do it after issuing my set call, by associating it with an expire time, as shown in Listing 3:

Listing 3. A Redis value can be set to expire
jedis.set("gone", "daddy, gone");
jedis.expire("gone", 10);
String there = jedis.get("gone");
assert there.equals("daddy, gone");

Thread.sleep(4500);

String notThere = jedis.get("gone");
assert notThere == null;

In Listing 3, I've used an expire call to set the value of "gone" to expire in 10 seconds. After Thread.sleep has been invoked, a get for "gone" will return null.

Data types in Redis

Working with Redis data types such as lists and hashes requires specialized command usage. For instance, I can create lists by appending values to a key. In the code in Listing 4, I issue an rpush command, which appends a value to the right or tail of a list. (A corresponding lpush command prepends a value to the front of a list.)

Listing 4. Redis lists
jedis.rpush("people", "Mary");
assert jedis.lindex("people", 0).equals("Mary");

jedis.rpush("people", "Mark");

assert jedis.llen("people") == 2;
assert jedis.lindex("people", 1).equals("Mark");

Redis supports a wide variety of commands for working with data types; moreover, each data type has its own set of commands. Rather than going over them individually, I'll show you some of them at work in a realistic application development scenario.

Redis as a caching solution

I've mentioned that Redis is easily employed as a caching solution, and it just happens that I have need of one of those! In this application example, I'm going to integrate Redis with my location-based mobile web service, called Magnus.

If you haven't been following this series, I first implemented Magnus using the Play framework, and I've developed or refactored it in various implementations since then. Magnus is a simple service that takes JSON documents via HTTP PUT requests. These documents describe the location of a particular account, which means a person holding a mobile device.

Now I want to integrate caching into Magnus — that is, I want to reduce I/O traffic in the form of a look-up by storing, in memory, data that doesn't often change.

Magnus caches!

My first step in Listing 5 will be to find out if an incoming account name (which is the key) is in Redis via a get call. A call to get will either return the account ID as a value or it will return null. If a value is returned, I'll use that as my acctId variable. If null is returned (indicating that the account's name isn't in Redis as a key), then I'll look up the account value in MongoDB and add it to Redis via a set command.

The advantage here is speed: The next time a requested account submits a location, I will be able to obtain its ID from Redis (acting as an in-memory cache) rather than having to go to MongoDB and incur a read I/O cost.

Listing 5. Using Redis as an in-memory cache
"/location/:account" {
  put {
    def jacksonMapper = new ObjectMapper()
    def json = jacksonMapper.readValue(request.contentText, Map.class)
    def formatter = new SimpleDateFormat("dd-MM-yyyy HH:mm")
    def dt = formatter.parse(json['timestamp'])
    def res = [:]
    
    try{

      def jedis = pool.getResource()	
      def acctId = jedis.get(request.parameters['account'])

      if(!acctId){
        def acct = Account.findByName(request.parameters['account'])
        jedis.set(request.parameters['account'], acct.id.toString())
        acctId = acct.id
      }

      pool.returnResource(jedis)
      new Location(acctId.toString(), dt, json['latitude'].doubleValue(), 
      json['longitude'].doubleValue() ).save()
      res['status'] = 'success'
    }catch(exp){
      res['status'] = "error ${exp.message}"
    }
   response.json = jacksonMapper.writeValueAsString(res)
  }
}

Note that the aMagnus implementation (written in Groovy) in Listing 5 still uses a NoSQL implementation for data model storage; it just uses Redis as a cache implementation for look-up data. Because my primary account data lives in MongoDB (in fact, it resides at MongoHQ.com) and my Redis data store runs locally, Magnus will get a significant speed boost when looking up subsequent account IDs.

But wait! Why do I need both MongoDB and Redis? Can't I get away with using just one?

Node.js for ORM

A number of projects provide an ORM-like mapping for Redis, including a highly influential Ruby-based alternative called Ohm. I checked out a Java-based derivative of that project (called JOhm) but eventually settled on using a variation written for Node. The beauty of Ohm and its derivative projects is that they allow you to map an object model into a Redis-based data structure. Thus, your model objects are both persistent and (in most cases) extremely fast in read situations.

Using Nohm, I was able to quickly rewrite my Magnus app in JavaScript and persist Location objects in a snap. In Listing 6, I've defined a Location model that includes three properties. (Note that I've kept my example simple by making timestamp a string rather than a true timestamp.)

Listing 6. Redis ORM in Node.js
var Location = nohm.model('Location', {
	properties: {
	    latitude: {
	      type: 'float',
	      unique: false,
	      validations: [
	        ['notEmpty']
	      ]
	    },
		longitude: {
	      type: 'float',
	      unique: false,
	      validations: [
	        ['notEmpty']
	      ]
	    },
		timestamp: {
	      type: 'string',
	      unique: false,
	      validations: [
	        ['notEmpty']
	      ]
        }
     }
});

Node's Express framework makes using my new Nohm Location object really easy. In my application's PUT implementation, I grab the incoming JSON values and put them into an instance of Location, via Nohm's p call. I then check to see whether the instance is valid. If it is, I persist it.

Listing 7. Using Nohm in Node's Express.js
app.put('/', function(req, res) {
  res.contentType('json');
	
  var location = new Location;
  location.p("timestamp", req.body.timestamp);
  location.p("latitude", req.body.latitude);
  location.p("longitude", req.body.longitude);
  
  if(location.valid()){	
  	location.save(function (err) {
	  	if (!err) {
		    res.send(JSON.stringify({ status: "success" }));
		  } else {		
		   res.send(JSON.stringify({ status: location.errors }));
		  }
	  });
  }else{
   res.send(JSON.stringify({ status: location.errors }));
  }
});

As Listing 7 shows, Redis pretty easily steps up to being an in-memory, blazingly fast datastore. And in some cases, it might even be a better cache than memcached!

In conclusion

Redis is useful for a wide variety of data storage scenarios, and because it can persist data to disk (and because it supports a rich data set), it's sometimes a worthy competitor to memcached. In cases where it makes sense for your domain, you can use Redis as a backing store for data models and queues. Redis client implementations have been ported to just about every programming language there is.

Redis isn't a total replacement for an RDMBS, nor is it a heavyweight store, rich with query features like MongoDB. In many cases, it can live side-by-side with these technologies, however. As I've shown in this article, Redis can be a good stand-alone data storage solution for applications that run heavy on data lookups, or where realtime statistics could be done via Redis's speedy atomic operations.


Downloadable resources


Related topics

  • Java development 2.0: This dW series explores technologies that are redefining the Java development landscape. Topics include NoSQL (May 2010), MongoDB (September 2010), and Gretty (August 2011).
  • Download Redis and Jedis: Redis is an open source key-value store and data structure server; Jedis is the current recommended client for Java-based development.
  • Get Nohm: A Node.js implementation of the Redis object-relational mapper, Ohm.
  • "Is memcached a dinosaur in comparison to Redis?" (Stackoverflow.com, May 2010): More tips for comparing and evaluating Redis and memcached for specific application scenarios.
  • "Applying memcached to increase site performance" (Martin Brown, developerWorks, August 2010): Learn more about the mechanics and performance benefits of in-memory data storage with memcached.
  • "James Phillips discusses the post relational world" (Java technology zone technical podcast series, developerWorks, May 2011): Couchbase is a NoSQL solution that combines the elasticity and high performance of CouchDB with traditional features like queries and indexing. Join Andrew Glover and Couchbase cofounder James Phillips as they discuss the brave new world of fast and safe web applications.

Comments

Sign in or register to add and subscribe to comments.

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Java development, Cloud computing
ArticleID=780065
ArticleTitle=Java development 2.0: Redis for the real world
publish-date=12132011