Introducing Riak, Part 2: Integrating Riak as a heavy-duty caching server for web applications

Using Riak as a caching server to help alleviate the load on application and database servers

This article is Part 2 of a two-part series about Riak, a highly scalable, distributed data store written in Erlang and based on Dynamo, Amazon's high availability key-value store. For websites with heavy loads, a scalable caching solution can lighten the load on the application and database servers. This particularly applies to data that is read often but updated only occasionally. Explore an in-depth example of an online betting site and how you can use Riak to implement a caching solution. You also will learn to integrate Riak with an existing website and look at other Riak features such as search and how to use it to directly serve user requests. You will need a working Riak cluster if you want to follow along with the examples. You can find the steps for setting up a cluster locally in Part 1 of this series.

Simon Buckle, Independent Consultant, Freelance

Photograph of Simon BuckleSimon Buckle is an independent consultant. His interests include distributed systems, algorithms, and concurrency. He has a Masters Degree in Computing from Imperial College, London. Check out his website at simonbuckle.com.



15 May 2012

Also available in Chinese Russian Japanese

Introduction

Certain types of data exhibit access patterns that lend themselves to be cached. For example, online betting sites have an interesting load characteristic: odds and bet slips get requested often but are updated relatively infrequently.

Other articles in this series

View more articles in the Introducing Riak series.

These situations need a highly scalable system with the following characteristics to cope with the demands of high loads:

  • The system acts as a reliable cache to reduce demand on the application servers and database
  • Cached items are searchable so you can update or invalidate them
  • Any solution is easily integrated into an existing site

Riak is a good choice for such a solution.

Riak is not the only candidate for implementing such a caching solution; many different caches are available. A popular one is memcached; however, unlike Riak, memcached doesn't provide any kind of data replication, meaning that if the server holding a particular item goes down that item becomes unavailable. Redis, another popular key/value store that could be used as a cache, supports replication through a master-slave configuration; Riak has no concept of a master (node), therefore making the system resilient to failure.


Website integration

Any solution needs to be easily integrated into an existing website. It is important to be able to do this, as it might not be possible—or even desirable—to migrate all of your existing data into Riak. As mentioned previously, certain types of data lend themselves to caching, particularly, in the case of a key/value store if you access that data with a primary key. That is the kind of data that is more suitable to migrate to Riak.

As mentioned in Part 1 of this series on Riak, a number of client libraries are available in languages such as PHP, Ruby, and Java™; the libraries provide an API that makes integrating with Riak very simple. In this example, I demonstrate the use of the PHP library to show how to integrate Riak with an existing website.

Figure 1 shows the set-up to consider for this example. I left out details such as load balancing, firewall, and so on. The servers themselves, in this case, are just simple front-end boxes with a LAMP stack installed.

I will assume that Riak is only used internally (it's not accessible from the outside) and that it runs in a non-hostile environment, so there are no security related issues such as authentication. This is not such a bad assumption to make as it might seem, as Riak does not have any built-in authorization anyway; you really should delegate authentication and the like to the application.

Figure 1. A simple website integration
Diagram showing how servers interact with the relational database and the Riak cluster

What follows is a basic example of how you might integrate Riak into your existing website. You will create a simple form, that when submitted, will use the PHP client to store an object in Riak based on the values that were entered in the form.

Figure 2 shows an example of a simple form that an administrator might use to create a bet entry in the system. Create this form in HTML and have it do a POST to the PHP script in Listing 1; you can use a similar form in the source code that accompanies this article as a starting point. The "key" field entered in the form will be used as the key to store the object under in the bucket.

Figure 2. Example form for creating a bet
Screen capture of form with entry fields for Key, Odds, and Description with a Create button

Listing 1 has example PHP code that shows how to use the PHP client library to integrate with Riak. Change the path to the PHP client library—specified in require_once—to wherever you have installed it. In this case, I just put it in the same directory as the PHP script. By default, all the client libraries expect Riak to be available on port 8098.

Listing 1. Example PHP code for integrating with Riak
<?php

require_once('./riak.php');

# Could do check here to see if the current user has the
# appropriate credentials ? delegated to application.

$client = new RiakClient('192.168.1.1', 8098);
$bucket = $client->bucket('odds');

$bet = $bucket->newObject($_POST['key']);        
$data = array(
    'odds' => $_POST['odds'],
    'description' => $_POST['description']
);
$bet->setData($data);

# Save the object to Riak
$bet->store();

echo "Thanks!";
?>

Save the code to a PHP file (call it whatever you like) and upload it and the form to some location on your website, For example, http://www.yoursite.com/riak-test.php. Fill out the example form and submit it. To prove it did work, try to retrieve the item directly from Riak using the key you entered in the form to create the item (see Listing 2).

Listing 2. Retrieving the item from Riak
$ curl -i http://localhost:8098/riak/odds/<key>
...
{ "odds":"", "description":"" }

Although this integration example used the PHP client, the approach is similar for other languages or application frameworks such as Java or Ruby on Rails.


Serving requests directly

In addition to using the client libraries to integrate Riak into your current set-up, it's possible to serve user requests directly from Riak, using it as a simple HTTP engine. To demonstrate this, I will create a simple demo to show how you can request pages directly from Riak.

Download the source code for this article. Make sure Riak is running then execute the script load.sh. This script will copy all the HTML and JavaScript files into a bucket called demo. This example uses the JavaScript client.

To view the demo, open up this URL in your browser: http://localhost:8098/riak/demo/demo.html

If you enter some values in the form to create a bet and you submit the form, a JSON object is stored in Riak. The properties of the object will correspond to the fields in the form. You will be redirected to a page that displays the value of the object you just created.

Listing 3 shows the code for creating the object from the values you entered. The values key, odds, and description come from the values entered into the form.

Listing 3. Example use of the JavaScript client library in Riak
client.bucket("odds", function(bucket) {
    var key = $('#key').val();
    bucket.get_or_new(key, function(status, object) {
        object.contentType = 'application/json';
        object.body = { 'odds': $('#odds').val(), 'description': $('#desc').val() };
        object.store(function(status, object, request) {
            if (status == 'ok') {
                window.location = "http://localhost:8098/riak/odds/"+key;
            } else {
            alert("Failed to create object.");
        }
        }); 
    });
});

As mentioned previously, I assume that Riak is running in a trusted environment. In this case there's no security issue from adding pages that store and retrieve items in Riak; however, you don't want to expose this kind of functionality to the Internet at large without having some form of authentication in place.

Although it's a simple example, it gives you an idea how Riak can serve page requests directly. You could, for example, include data stored in Riak directly in your existing web pages either by using a technique such as JSONP or cross-origin resource sharing—AJAX requests are restricted to the same server the page resides on by a same domain policy—or by proxying requests through your servers to Riak, to fetch the required data.


Using Riak as a cache

Caches are used to provide fast access to data. If requested data is contained in the cache (cache hit), the application can serve the request quickly by reading the value from the cache, comparatively quicker than retrieving the value from a database. If something is not in the cache (cache miss), then the application typically has to hit the database to retrieve the data. Generally, the more requests that you can serve from the cache, the faster the system will be. Riak has a number of features that make it a good choice for implementing a caching solution.

One such feature of Riak is its pluggable storage back-end; the storage back-end determines how the data is stored. There are several available, but I'm not going to cover them all here (see Resources for more information). The default storage back-end is Bitcask, an Erlang application that provides an API for storing and retrieving data backed by a hash table, which provides fast access to data; data is persisted.

One back-end is perhaps more relevant for this article: the Memory back-end. The Memory back-end uses an in-memory table to store all of its data (internally it uses Erlang's ets tables) and, when enabled, makes Riak behave like an LRU cache with timed expiry. The advantage of using an in-memory store is that it is significantly faster than if you have to go to disk to retrieve the data. When the data is stored in memory—it's not persisted—and a node goes down, the data stored in that node will be lost. As you use it as a cache this is less of an issue—the application can always retrieve the data from the database—as it would be if you used Riak as your primary data store. Riak replicates the data across several nodes in the cluster, so it will still be available.

Riak ships with the Memory back-end included. To use the Memory back-end, open app.config for each node in the cluster, locate the property storage_backend and change it from riak_kv_bitcask_backend to riak_kv_memory_backend. Now add the code in Listing 4 to the end of the file.

Listing 4. Using the Memory back-end
{memory_backend, [
    {max_memory, 4096},	%% 4GB of memory
    {ttl, 86400}        %% Time in seconds
]}

Change the values to whatever is appropriate for your set-up. Restart the nodes in the cluster.

It's also possible to run multiple storage back-ends within a Riak cluster. This is useful as it means it's possible to use different back-ends for different buckets. For example, you could configure a bucket (let's call it cache) to use the Memory back-end, but for the other buckets—those that should persist the data—to use, say, Bitcask.

Now that you have Riak set-up to behave like a cache, you need some way to access the data in the cluster to either update it or possibly invalidate it for some reason (before its expiry time).


Looking for something?

As you have already seen, to retrieve data stored in Riak when using the HTTP interface, you construct a URL consisting of the bucket name and the key of the object you want to retrieve then do an HTTP GET on that URL. This is perfectly adequate when you know what the key is! However, sometimes you either don't know the key of the object you want to retrieve, or you want to retrieve a set of objects satisfying certain criteria. Then you need a way to search for objects held in the cluster.

You have already seen how to query data by running a Map/Reduce job over documents that are stored in the cluster. The time taken to execute the query will, in general, be proportional to the number of documents in the cluster; the more documents, the longer it takes to query those documents. This is not a problem for queries that are not time sensitive. By this, I mean queries where the user does not expect to get a reply instantly. For something like search, it's not feasible to (dynamically) search all of the documents every time; it could take minutes or hours to get the results back!

Fortunately Riak already has a solution to this problem: Riak Search. Riak Search provides the functionality you need to search documents stored across your cluster. The subject of search is too great to go into in any depth in this article but at a high level it works like this: Documents are tokenized (Riak Search uses standard Lucene analysers) and added to an inverted index. This index is then queried based on the search terms a user enters. As new documents are added, they too are indexed and added to the index.

Riak Search is disabled by default. Before you can use it you need to enable it. For each node in your cluster, open up rel/riakN/etc/app.config, locate the property riak_search and set it to true. You will need to restart the nodes in the cluster.

Riak allows you to specify the name of a function to run before and after a document is added to a bucket through the use of pre- and post- commit hooks. For example, you might want to check that a document has particular required fields before adding it to the cluster. To search a document, it needs to be indexed. To do this, install a pre-commit hook on the bucket where the documents are stored. To do that, run the following command: $ rel/riak/bin/search-cmd install <bucket name>

This will install a pre-commit hook riak_search_kv_hook on the bucket. Now, whenever a document is added to that bucket, it is analyzed and added to the index. The whitespace analyser is the default analyser; it processes characters into tokens based on whitespace, which then get indexed. A number of different analysers are available and you can also define your own.

In many cases, Riak Search knows how to index your data. For example, out-of-the-box, if a JSON object is added to a bucket, the value of each property will be indexed and can be queried using the property name in the query string. See the search example in Listing 5. For more complicated structures it's possible to define your own schema that tells Riak Search how to index your data.

When you have some documents indexed you need to be able to issue queries against them. One way is to run a query from the Erlang shell. For example, the query in Listing 5 searches the odds bucket for all bets that are related to horse racing; you do this by querying the description property of the stored item.

Listing 5. Searching the odds bucket for bets related to horse racing
$ rel/riak/bin/riak attach

search:search(<<"odds">>, <<"description:horse">>).

In addition, Riak Search also provides a Solr-compatible HTTP API for searching documents. Apache Solr is a popular enterprise search server with a REST-like API. By making the API compatible with Solr it should be possible to switch out Solr—if you use it—and use Riak Search to power your searches instead. For example, to search for the odds for a particular event using the Solr interface, you would do something like this: $ curl "http:localhost:8098/solr/odds/select?start=0&q=description:horse"

With search set-up, you now can locate items in the data store without knowing the primary key of the items you are looking for.


Conclusion

Other articles in this series

View more articles in the Introducing Riak series.

Riak's ability to scale and reliably replicate data—plus other features such as search—makes it an ideal choice to implement a caching solution for heavy-load sites. You can easily integrate it into an existing site. With its ability to serve requests directly, you can use Riak to reduce and eliminate the load on the application and database servers.


Download

DescriptionNameSize
Article source coderiakpt2sourcecode.zip85KB

Resources

Learn

Get products and technologies

  • Evaluate IBM products in the way that suits you best: Download a product trial, try a product online, use a product in a cloud environment, or spend a few hours in the SOA Sandbox learning how to implement Service Oriented Architecture efficiently.

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Open source on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Open source, Web development
ArticleID=814646
ArticleTitle=Introducing Riak, Part 2: Integrating Riak as a heavy-duty caching server for web applications
publish-date=05152012