How to combine two IBM services to optimise for speed and cost by implementing a caching system for database queries.
Each database technology has its own strengths and weaknesses. Some are built for high availability and data durability (at the expense of more hardware and extra cost); others favour speed and can churn out blazingly fast queries (but may lose data in a sudden power failure).
As a user, you don't care what the technology is — you normally just want things to be accurate and fast. As a service provider, you want your users to be happy and your costs to be manageable:
So when it comes to choosing database technology that optimises for user experience and cost, a combination of several technologies is often a better fit than putting all your eggs in one basket.
In this tutorial, we are going to show you how to combine two IBM services to optimise for speed and cost by implementing a caching system for database queries.
IBM Cloudant is a fully managed, distributed database optimized for heavy workloads and fast-growing web and mobile apps.
IBM Databases for Redis is a fully managed in-memory data structure store, used as a database, cache and message broker.
The fully-managed nature of these services allows you to focus on developing your applications instead of having to worry about server infrastructure and what to do if you hit the jackpot and your product goes viral.
This tutorial should take you less than an hour to complete. It will not be entirely cost-free because the IBM Redis service does not come with a free tier, but if you deprovision the services after completing it, you should not have to pay more than a few dollars.
The problem: Wasteful repetition
In Cloudant Classic, querying is good but computationally and economically expensive, so there are limits to how many queries per second you are allowed to do. The free tier, for example, limits you to five queries per second. Any more than that and queries will start to be rejected, leaving customers frustrated.
The Cloudant client libraries have easy-to-use retry logic that will happily capture these rejections for you and retry, but in some cases, developers might not want to bottleneck operations on retry logic with high concurrent load or high throughput applications. In this case, Cloudant can be easily configured to perform any amount of queries per second, but at additional expense. So you want to do these queries only when necessary, otherwise you might find yourself with a higher bill than desired.
Imagine that you run a popular blog, being visited by thousands every hour. Typically, every visitor will come to your homepage first and you will have to get a list of the latest stories to show them. You can do this by querying your database ("select all stories in the database and return the 10 most recent ones") every time a visitor arrives at your homepage. But you only write a few stories every day, so that expensive query will return the same results over and over again. If only there was a way of storing the results of that query somewhere.
Enter Redis, an in-memory key-value database that is ideal for storing and retrieving data at lightning speed.
The first time you do the homepage query, you store (cache) the results in Redis, and all the subsequent visitors get the data from Redis instead. Because Cloudant gets fewer queries, you are able to set a lower provisioned throughput capacity on Cloudant, which leads to lower costs. Because Redis is so fast, users will see the homepage load faster. There is one catch though — the trade-off is that your users may not see the latest content (more on that later).
The project: Team Directory
Team Directory is a web app that contains the details of all divisional employees. These employees are assigned to six different colour-coded teams (Red, Orange, Green, Blue, Yellow and Purple). When you visit the app, you can select one team and see the names of all the members of that team. The app will also show you whether this list of team members was obtained from the Cloudant database or from the Redis cache and how long the query took. You should be able to see just how much quicker cached data is to retrieve.
You will need the following:
Step 1: Provision a Cloudant service (CloudantDirectory)
You will need to create a Cloudant service instance and some credentials to access it from your application. To do that, follow the steps described in this document.
In the Instance Name box, call your instance "CloudantDirectory".
Make a note of the apikey and url values of your service credentials. We will need those later.
Step 2: Create the Cloudant database (directory)
Now you need to create a Cloudant database to store your directory data. Follow Steps 1 and 2 in this document.
Call your database "directory". You are now ready to store your directory data.
The test data (the name and details of all employees) is part of the code and has been generated using this handy tool. It will be uploaded to the database when your application runs for the first time.
Step 3: Create Redis
Go here to create your Redis instance. Give it a memorable name. You can accept all the other defaults, including the smallest instance size.
Once the instance has been created, go to its settings (navigate to your Resource List and find it under Services and Software).
You will need to change the admin password. You can do it following the instructions here.
Make a note of the password because you'll need it later.
TLS certificate and deployment URL
You will also need the TLS certificate. Go to the Overview section and scroll down to the Endpoints section.
Click on Download Certificate.
Rename the downloaded file to something more compact (e.g., "redis.cert").
Also, copy the connections URL that you find below the certificate. It looks something like this: rediss://$USERNAME:$PASSWORD@0ee92fc5-2233-3344-5566-5678ee34567.68ea2cbd8c8d4c30b5b8450be6b8593a.databases.appdomain.cloud:32340/0
You will want to replace $USERNAME with "admin" and $PASSWORD with the password you created above. So you will end up with something like this: rediss://admin:firstname.lastname@example.org:32340/0
Make that URL and keep it somewhere in your notepad.
Step 4: Clone the code repository
In your terminal, navigate to a directory of your choice and then type the following:
You will need the TLS certificate from the previous step in the cloudant-redis-cache folder, so move it over:
Step 5: The environment variables
Your programme will read a number of environment variables to access the Cloudant and Redis services (using the values obtained in Steps 1 and 3 above).
In your terminal window, type the following:
(In the case of Cloudant, if you use the above URL and key variable names, the Cloudant SDK will find them in your environment without the need for any additional code.)
Step 6: Run the service
Install all the dependencies and start the server:
Now open a browser and visit http://localhost:8080. You should see a button for each team:
You can now click on any of the coloured buttons and obtain a list of team members. Look at where the data came from, Cloudant (cache = false) or Redis (cache = true) and how long it took.
Click the same button again. You will see the data coming from the Redis cache and taking a lot less time to execute:
Note that in a real cloud-based application, the application server and the Redis instance would be close to each other (in the same data centre) so latency between the two would be only a few milliseconds. In this example, there are extra network hops between your locally-hosted application server and the cloud-hosted Redis cache, so latency gains won't be as good as in production.
The cached data is set to expire within 60 seconds. So if you return to one of the teams after 60 seconds, the data will again be retrieved from the database and not the cache.
You can also use the Clear Cache button to remove all cached data from Redis.
A bit more info
The application is a very simple Node.js application. It uses three main packages:
- @ibm-cloud/cloudant to connect to IBM Cloudant and read/write data.
- Redis to connect to the Redis instance and read/write data.
- Express to enable a simple web server that allows users to interact with the data.
There are two main files:
This runs the web server and communicates with Cloudant and Redis. When the front end submits a team selection to the /team route (see below), the app.route function will first check the cache to see if it has the data already. If it does, then it will return that. Otherwise, it will make a query to Cloudant to retrieve team data, store it in the cache and return it to the front end.
The read operation uses a Cloudant design document and a MapReduce view to select documents. This is beyond the scope of this tutorial, but you can read more about views and design documents here.
This script also contains some code that uploads some test data (contained in the directorydata.json file) to the database the first time it is run.
This is the one and only page of the application and is using the Vue.js framework. When it loads it will show you the available teams.
When you select a team, it will make an HTTP POST request with your choice to the /team route of the application (see above). A successful return from the application will contain all the data for the team members. We are only displaying their names and town for simplicity.
Although cost (and, therefore, cost savings) is very much specific to your application's usage patterns, savings could be substantial. On Cloudant, every allocation of capacity for five additional queries per second adds $75 per month to your bill (it also provides an additional 100 reads per second and 50 writes per second at the same time.) Meanwhile 5GB of RAM (and 10GB of disk) on Redis is about $61 per month. So if you can return a lot of (expensive) database queries by retrieving them from a (cheap) RAM cache, then your savings can mount quickly.
Fast and stale vs. slow and current — the trade-off
Caching strategies involve a number of trade-offs that are very application-specific. There are many resources about this on the web, and this is beyond the scope of this tutorial, but the following are some things to think about:
- How stale can your data be? In the example of the blogging site, it may be acceptable for users not to see a new article until several minutes after it was published (in effect, they are seeing an old copy of your homepage that was cached before your new article was published). But if your website is publishing, say, live football results, it will not do for people to wait minutes to find out that a goal has been scored. You have to decide how long you can afford to show your users potentially old data without impacting the user experience and clear out old caches so that your application is forced to query the main database again.
- How do you choose your cache keys? It is convenient, for example, to use a search query as a key (think "restaurants in Leeds" as a typical search query in a business listing site). You could use the whole string and even some additional parameters like number of results to return as your key. It is also common practice to create a hash of the query (like an md5 hash) and use that as a key.
- How big a cache do you need? If you have lots of people accessing a limited number of assets — like the homepage of your site and a few topic sections — then your cache will be small. But if you have lots of people making different requests — like a search engine — then you might need a cache with more space. If Redis has insufficient space to store all the keys being cached at any one time, it will error unless you have configured Redis as a cache, in which case it will delete the oldest key to make way for new data.
In this tutorial, we have combined two IBM Cloud services to optimise cost and user experience: IBM Cloudant as a document store and query engine and IBM Cloud Databases for Redis as a content cache. Cached documents can be retrieved more quickly and more cheaply, but the trade-off is that your application may be showing old data to your users for a period of time.
If you followed this tutorial, remember to de-provision your Redis instance to stop incurring charges.
If you want to take the next step in your developer journey, check out our guidance on cloud-native designs for handling Redis retry logic.