Using CouchDB with Clojure

Supporting libraries make accessing CouchDB from Clojure a compelling choice

This article shows how to access the CouchDB APIs using Clojure, a dynamic language for the JVM. Examples use the Clutch API and clj-http library in parallel to illustrate a higher-level CouchDB API and lower-level REST-based calls, respectively. The article will help the novice Clojure developer who wants to use CouchDB, and anyone interested in CouchDB's underlying REST APIs.

Share:

Ryan Senior (senior.ryan@gmail.com), Senior Engineer, Revelytix

Ryan SeniorRyan Senior is a senior engineer at Revelytix, developing Semantic Web software using Clojure. Previously, he worked as a Java developer in several industries, including manufacturing, finance, and health care. He earned a Master's degree in Computer Science from the University of Illinois at Urbana-Champaign and a Bachelor's degree in Computer Science from Western Illinois University. Ryan is also a member of the Strange Loop core team. He can be found on Twitter as @objcmdo and in the blogosphere as Object Commando.



22 February 2011

Also available in Chinese Japanese

Apache CouchDB is an open source, Erlang-based, document-oriented database. CouchDB is schemaless in that each document stands on its own and requires no specific fields (other than an identifier and a revision). All actions, from querying the database to creating or changing data in it, are performed via a REST-based API. CouchDB can be a great alternative to relational databases for many applications, especially ones involving less-structured data. This article covers the use of Clojure for performing basic CouchDB operations, querying CouchDB using views, and database replication. The code examples show how to access the CouchDB REST APIs from Clojure at a higher level using the Clutch API and a lower level using a more fundamental HTTP library: clj-http.

Environment setup

The article's example code was written for CouchDB 1.0.1, Clojure 1.2.0, Clutch 0.2.4, and clj-http 0.1.2. The Leiningen build tool was used to download and set up the dependencies for the example code. The examples are written from the perspective of coding at the Clojure REPL.

Before beginning, make sure you have CouchDB installed (see Resources for installation information); prepackaged binaries are available for many operating systems, and yours may include them by default. To set your environment up to run the code, install Leiningen first (see Resources for a download link). Then create a new Leiningen project with lein new couchdb-from-clojure. Add Clutch and clj-http to the project.clj file so that it looks like Listing 1:

Listing 1. CouchDB with Clojure project.clj
(defproject couchdb-with-clojure "1.0.0-SNAPSHOT"
  :description "CouchDB from Clojure Examples"
  :dependencies [[org.clojure/clojure "1.2.0"]
      [org.clojure/clojure-contrib "1.2.0"]
     [com.ashafa/clutch "0.2.4"]
     [clj-http "0.1.2"]])

Next, run lein deps to download the needed JAR files. You can run the code from a REPL in whatever environment you would like. You can launch one from Leiningen via lein repl or from your IDE of choice. From the REPL prompt, type the statements shown in the REPL session in Listing 2 to include the namespaces that the article uses:

Listing 2. Requiring clj-http, contrib.json, and Clutch
user> (require ['com.ashafa.clutch :as 'clutch])
nil
user> (require ['clj-http.client :as 'client])
nil
user> (require ['clojure.contrib.json :as 'json])
nil
user> (def movies-db "http://localhost:5984/movies")
#'user/movies-db

The last statement in Listing 2 defines the URL that you'll use to access the CouchDB database used for the article's examples. The default URL for locally installed CouchDB is http://localhost:5984. Change the port number as needed in your REPL session if your copy of CouchDB is configured differently.


Working with JSON

Data in CouchDB is structured in a self-standing JavaScript Object Notation (JSON) document form — a significant difference from relational databases. Consider a movies database. To model movies in a relational database, you'd likely have a table for movie-specific information like title and release date. Movies have actors, directors, producers, and so on, but you'd probably not put that information in the movie table. Rather, you'd have an actors table or something more general (such as a table of movie participants) and then have a reference from the movie table to a row in the actors table. Even this structure is probably too simple. You would probably need to set up a many-to-many relationship with a join table, requiring several joins to determine the actors for a particular movie. This process, called normalization, restructures the data to limit redundancy. Relational databases are tuned for handling data in this way.

In a CouchDB movies database, all information for a particular movie would be contained in a single document. This can cause some duplication, compared to a more-normalized structure. For example, an actor's name would appear in a document for each movie in which that actor had a role. Listing 3 shows how a movie document that's stored in or emitted from CouchDB might look:

Listing 3. Example JSON document for movie data
{"movie-title":"Psycho",
 "director":"Alfred Hitchcock",
 "runtime":109,
 "year-released":1960,
 "studio":"Shamley Productions"
 "actors":["Anthony Perkins" "Vera Miles" "John Gavin" "Janet Leigh"]}

JSON format distinguishes among objects, arrays, and literals. Curly braces {} indicate an object, and brackets [] indicate an array. The literals are strings like "Psycho" and integers like 1960. This formatting nicely coincides with the Clojure persistent data structures, in which curly braces are used for maps and square brackets for vectors; literals have the same syntax too. Table 1 shows examples of JSON data types and their Clojure equivalents:

Table 1. Equivalent JSON and Clojure data types
Data typeJSON exampleClojure exampleDescription
Number11Type for representing integers and real numbers
String"Example String""Example String"Type for representing strings
Booleantrue/falsetrue/falseBoolean type
Array[1, 2, 3, 4][1 2 3 4]JSON array; Clojure vector
Object{"key1" : "value1", "key2" : "value2"}{:key1 "value1" :key2 "value2"}JSON object; Clojure map

Listing 4 is a JSON document (from CouchDB) followed by an equivalent Clojure map representation:

Listing 4. Comparing JSON objects and Clojure maps
;;JSON object for Psycho
{"_id":"Psycho"
 "Director":"Alfred Hitchcock",
 "runtime":109,
 "year-released":1960,
 "studio":"Shamley Productions"}

;;Clojure map for Psycho
{:_id "Psycho"
 :director "Alfred Hitchcock"
 :runtime 109
 :year-released 1960
 :studio "Shamley Productions"}

The clojure.contrib.json library makes it easy to work with JSON in Clojure. This library takes a Clojure data structure and converts it to a JSON string, and the reverse. It converts keys in the map entries of JSON objects to Clojure keywords, which are more idiomatic to work with in Clojure. The HTTP-based examples I'll show use clojure.contrib.json. The Clutch API uses this library behind the scenes, shielding you from the JSON specifics.


Creating a CouchDB document

Suppose you want to be able to ask CouchDB for all the information it has about the movie Psycho. To do this, you need to store a document like the one shown in Listing 3. CouchDB can hold many databases, so the first step is to create a database, then add the document. Listing 5 creates a new database at the movies-db URL and then creates a document:

Listing 5. Creating a CouchDB document with Clutch
user> (clutch/create-database movies-db)
{:ok true...}

user> (clutch/with-db movies-db
         (clutch/create-document {:director "Alfred Hitchcock"
                                  :runtime 109
                                  :year-released 1960
                                  :studio "Shamley Productions"}
                                 "Psycho"))
{:_id "Psycho" ... }

Listing 5 shows some of what Clutch is doing behind the API to create the CouchDB document. Note that the last argument passed into create-document is the document's ID.

Listing 6 shows the equivalent code in clj-http:

Listing 6. Creating a CouchDB document with clj-http
user> (client/put movies-db) ;; Create Database
{:status 201 ... :body "{\"ok\":true}\n"}

user> (->> {:director "Alfred Hitchcock"
             :runtime 109
             :year-released 1960
             :studio "Shamley Productions"}
            json/json-str
           (hash-map :body)
           (client/put (str movies-db "/Psycho")))

{:status 201... :body "{\"ok\":true,
\"id\":\"Psycho\",\"rev\":\"1-ba6b110617a1a8920903b648f208a8fac\"}\n"}

CouchDB does not allow you to create the same database twice. If you want to run both the Listing 5 and Listing 6 examples, delete the database using (client/delete movies-db) before recreating it.

Listing 6 creates a hashmap with the movie information, then converts it from the Clojure hashmap to a JSON document (using json/json-str). The document is then put inside another hashmap that clj-http recognizes as the body of the request. The code ultimately issues a PUT request to CouchDB to store the document. Note that the URL used to PUT the document is the movies database URL, followed by the ID of the CouchDB document (Psycho in this case).

Double-checking your work

To check that the document has been successfully persisted, you can retrieve it programatically from CouchDB and examine it. Listing 7 shows how to retrieve a document using Clutch, which converts from the JSON response to a Clojure map:

Listing 7. Retrieving a document from CouchDB using Clutch
user> (clutch/with-db movies-db
         (clutch/get-document "Psycho"))

{:_id "Psycho",
 :_rev "1-a6b110617a1a8920903b648f208a8fac",
 :director "Alfred Hitchcock",
 :runtime 109,
 :year-released 1960,
 :studio "Shamley Productions"}

Listing 8 shows a similar example that uses clj-http:

Listing 8. Retrieving a document from CouchDB using clj-http
user> (-> (str movies-db "/Psycho")
           client/get
           :body
           json/read-json)

{:_id "Psycho",
 :_rev "1-a6b110617a1a8920903b648f208a8fac",
 :director "Alfred Hitchcock",
 :runtime 109,
 :year-released 1960,
 :studio "Shamley Productions"}

The clj-http code in Listing 8 converts from the JSON response by taking the body of the response and calling json/read-json on it.

You can also easily check your work outside of your code, in several ways. One way is just to type the same REST URLs you use in your code into a browser or using cURL or a similar tool. Enter the GET URL you used earlier: http://localhost:5984/movies/Psycho.

The easiest way is to use CouchDB's Futon application (see Resources). It ships with CouchDB and can be viewed through the URL http://localhost:5984/_utils. (In the URL, substitute the correct port number if your CouchDB installation is not configured to use the default port.) You can also use Futon for document and view creation, replication, and much more.

Next I'll dive a little deeper into adding documents to CouchDB.


Creating a document — in depth

In the Creating a CouchDB document section, you created a record for the movie Psycho. To illustrate the importance of choosing a good document ID, consider adding a new movie to the database. Psycho was remade in 1998, so try to add the new version to the database, as shown in Listing 9:

Listing 9. Adding a document with conflicting ID
user> (clutch/with-db movies-db
         (clutch/create-document {:director "Gus Van Sant"
                                  :runtime 105
                                  :year-released 1998
                                  :studio "Universal Pictures"}
                                 "Psycho"))
;;409 Conflict

The code in Listing 9 causes an error because it tries to use an ID that already exists in the database. All CouchDB documents are stored by ID, and each document ID must be unique. The movie title, which in this case might have seemed unique — was not. Basing the ID on the movie title was a setup for collisions. (In this case, you hit a collision just with a single database. In a distributed environment, this could be even more common. I'll discuss these replication issues in more depth shortly.) So, you need to redo how you specify the document ID. The recommended way is to use something that has a guarantee of uniqueness, such as a Universal Unique Identifier (UUID). With Clutch, if you don't specify an ID, one will be generated for you.

Listing 10 is a new document, refactored to account for the need for a unique ID:

Listing 10. Picking a better movie ID (Clutch example)
user> (clutch/with-db movies-db
         (clutch/create-document {:movie-title "Psycho"
                                  :director "Gus Van Sant"
                                  :runtime 105
                                  :year-released 1998
                                  :studio "Universal Pictures"}))
{:_id "d6993381eb5ede34fded2f018b9f10b0",
 :_rev "1-29ff788958134c2023d9be94a9231528",
 :movie-title "Psycho",
 :director "Gus Van Sant",
 :runtime 105,
 :year-released 1998,
 :studio "Universal Pictures"}

The document in Listing 10 moves the original ID, Psycho, to movie-title, leaving no ID keypair. Notice the two additional fields in the document. One is _rev, which I'll discuss later. The other is _id, an autogenerated unique value that avoids the problem in Listing 9. Note that your own _id and _rev values will be different from those in Listing 10, because they are generated.

Listing 11 is the similar clj-http code:

Listing 11. Picking a better movie ID (clj-http example)
user> (->> {:movie-title "Psycho"
             :director "Gus Van Sant"
             :runtime 105
             :year-released 1998
             :studio "Universal Pictures"}
            json/json-str
            (hash-map :body)
            (client/put (str movies-db "/" (java.util.UUID/randomUUID)))
            :body
            json/read-json)
{:ok true,
 :id "f043a641-045b-4316-83f5-67c8f9bb99c3",
 :rev "1-29ff788958134c2023d9be94a9231528"}

Most of Listing 11 is the same as the previous clj-http code, one difference being where the ID is coming from. Listing 11 uses a JVM-created UUID. If you POST the document in Listing 11 to CouchDB, CouchDB will also automatically generate a UUID and add it to the document using its own UUID-generation strategy. You can also get CouchDB-generated UUIDs from the URL http://localhost:5984/_uuids.

To validate that you've created the document correctly, you can retrieve a list of all the document IDs from the database. Listing 12 does this with Clutch:

Listing 12. Getting database document IDs with Clutch
user> (clutch/with-db movies-db
         (->> (clutch/get-all-documents-meta)
              :rows
              (map :id)))
("d6993381eb5ede34fded2f018b9f10b0" "Psycho")

Listing 13 uses clj-http to retrieve an IDs list:

Listing 13. Getting database document IDs with clj-http
user> (->> (str movies-db "/_all_docs"
            client/get
            :body
            json/read-json
            :rows
            (map :id))
("d6993381eb5ede34fded2f018b9f10b0" "Psycho")

The calls in Listing 12 and Listing 13 ask CouchDB for all of the document metadata in the movies database, returning one autogenerated ID and another named Psycho. For consistency, Listing 14 deletes the Psycho document and adds it back in with a generated ID:

Listing 14. Deleting document with old key and re-adding
(clutch/with-db movies-db
    (let [original-psycho (clutch/get-document "Psycho")]
        (clutch/delete-document original-psycho)
        (-> original-psycho
            (assoc :movie-title (:_id original-psycho))
            (dissoc :_id)
            clutch/create-document)))
{:_id "84bbfce1b0e4cf6c9aa2f4196909f39d", :movie-title "Psycho"...}

Listing 14 retrieves the current Psycho document, deletes it in CouchDB, and then recreates it by adding a new movie-title key with the current ID value and removing the ID from the map. The ID must be removed because Clutch will create a document with that ID if it's included.


Updating a CouchDB document

Updating a document is just like inserting a document, with one slight difference. When you create a document, it automatically is given a revision. Listing 15 shows the output of a newly created movie:

Listing 15. Example document with revision
user> (clutch/with-db movies-db
         (clutch/create-document {:movie-title "Rear Window"
                                 :director "Alfred Hitchcock",
                                 :runtime 112,
                                 :year-released 1955,
                                 :studio "Paramount Pictures"}))

{:_id "1f91c6a2e1af23fa89ca640e889bbdb6",
 :_rev "1-43386b891e9ad538de0d16fcb66aff5e",
 :movie-title "Rear Window"...}

The revision is the _rev map entry in Listing 15. This revision is actually the MD5 hash of the document, automatically added by CouchDB. Every time the document changes, this hash changes. This revision is always needed when you update a CouchDB document so that CouchDB knows which version of the document your change is updating. Listing 16 gets the Rear Window movie document, makes a change, and updates that document to add an alternate title to the movie:

Listing 16. Updating a document
user> (clutch/with-db movies-db
         (-> (clutch/get-document "1f91c6a2e1af23fa89ca640e889bbdb6")
             (clutch/update-document {:alternate-titles ["La ventana indiscreta"]})))

=> {:alternate-titles ["La ventana indiscreta"]
    :_id "1f91c6a2e1af23fa89ca640e889bbdb6",
    :_rev "2-6601a377a55d733c0bd111539801edc8",
    :movie-title "Rear Window"...}

Note that Listing 16 queries for the document by UUID, so check Listing 12 (or Listing 13) for your own UUID and use it in Listing 16.

The update-document call in Listing 16 passes in two arguments. The first is the original document, and the second is a hash map that is merged in with the original document before the updated document is stored in CouchDB. The update-document function is actually a multimethod and mirrors many of the standard Clojure approaches to manipulating maps, such as merging in key/value pairs and updating nested structures like update-in. It also accepts as an argument a single map that has already made the necessary changes (but leaving the revision and ID intact).

Listing 16's approach is a little optimistic from a concurrency perspective. Now consider the code in Listing 17:

Listing 17. Updating with a conflict
user> (clutch/with-db movies-db
        (let [client1-rw (clutch/get-document
                         "1f91c6a2e1af23fa89ca640e889bbdb6")
              client2-rw (clutch/get-document
                          "1f91c6a2e1af23fa89ca640e889bbdb6")]
         (clutch/update-document client1-rw
                                 #(conj % "Fenêtre sur cour")
                                 [:alternate-titles])
         (clutch/update-document client2-rw
                                 #(conj % "Arka pencere")
                                 [:alternate-titles])))
;; 409 Conflict Error

Here the document is retrieved twice, and the first update proceeds without issues. The second update fails with a 409 error — an HTTP Conflict error code that applications use to relay to callers that the operation could not be completed because of a conflict with the current state of the resource. When the documents were retrieved, they had the correct revision ID, so the first update succeeds, but the second update fails because a new version of the document is now in place that was not seen by the second updater. CouchDB will not let you update a document if your revision number is out of date. What can the second updater do? Unfortunately, the answer is that it depends on your goals. One way to reduce the possibility of this error is always to retrieve the document immediately before the update. If you modify the code in Listing 17 as shown in Listing 18, the update works:

Listing 18. Nonconflicting two-client update
(clutch/with-db movies-db
  (-> (clutch/get-document "1f91c6a2e1af23fa89ca640e889bbdb6")
      (clutch/update-document  #(conj % "Fenêtre sur cour")
                               [:alternate-titles]))
  (-> (clutch/get-document "1f91c6a2e1af23fa89ca640e889bbdb6")
      (clutch/update-document #(conj % "Arka pencere")
                              [:alternate-titles])))
{:movie-title "Rear Window",
 :alternate-titles ["La ventana indiscreta"
                    "Fenêtre sur cour"
                    "Arka pencere"]
 ...}

Even though that solves the immediate problem, the problem can still occur. You need to write code to handle this scenario. Depending on your requirements, the solution might be as easy as reretrieving the document and remerging with that new version. Another possibility is to return an error to the user (for example, if the user is trying to buy an item that's no longer available).

A final point about updating documents is that CouchDB has no concept of a portion of a document changing, only that the document did change. This is because any change to the document will result in a new hash of the document (which is what CouchDB uses to create the revision ID). Purely additive changes, deleting portions of documents, and modifying documents are treated the same and will result in a new revision. This is also important in replicating databases.


CouchDB views

CouchDB isn't queried via SQL like a relational database. The primary way data is retrieved is through MapReduce-style code called views. You can choose among many languages to write views. (The default language is JavaScript. The following examples will also work for JavaScript, but the specific MapReduce code differs.)

For this article, you'll use Clojure as the view language via the Clojure view server, which is included with Clutch. To use the view server, you need to have it installed in your copy of CouchDB (see Resources for a link to installation information on the Clutch website). Note that the view server is an addition to the server, not to your client code.

I'll start by creating and running a view, and then I'll go into more depth. First, to have more items in the database to query, add a few more documents, as shown in Listing 19:

Listing 19. Bulk-adding documents with Clutch
user> (clutch/with-db movies-db
        (clutch/bulk-update
          [{:movie-title "The Godfather"
            :director "Francis Ford Coppola"
            :runtime 175
            :year-released 1972
            :studio "Paramount"}
           {:movie-title "The Godfather II"
            :director "Francis Ford Coppola"
            :runtime 200
            :year-released 1974
            :studio "Paramount"}
           {:movie-title "The Godfather III"
            :director "Francis Ford Coppola"
            :runtime 162
            :year-released 1990
            :studio "Paramount"}]))

Listing 19 uses CouchDB's bulk-update feature. Bulk updates work for newly created documents and for updating multiple existing documents.

The code in Listing 20 queries all documents to create a temporary view that shows the runtimes of all movies in the database:

Listing 20. Temporary view example
user> (clutch/with-db movies-db
        (clutch/ad-hoc-view
          (clutch/with-clj-view-server
            {:map (fn [doc] (when (and (:movie-title doc)
                                      (:runtime doc))
                             [[(:movie-title doc)
                               (:runtime doc)]]))})))
{:total_rows 6,
 :rows [{:id "d6993381eb5ede34fded2f018b9f10b0",
         :key "Psycho",
         :value 105}
        {:id "84bbfce1b0e4cf6c9aa2f4196909f39d",
         :key "Psycho",
         :value 109}
        ...]}

For each movie, the movie title is returned with its associated runtime. Notice that I am also checking for the existence of movie-title and runtime. I do this because the code will run on each document, including newly created ones. CouchDB uses no defined schemas, so all documents don't need to contain the same fields. It's a good idea to guard your views against the possibility that the fields you're querying against don't exist in all the documents.

The function in Listing 20 returns a vector of vectors. This is because each document could generate zero to many map entries, and each map entry is represented as a vector. The first item in the inner vector is the key (in this case, movie-title), and the second is the value (in this case, an integer representing the running time).

This view is outputting something similar to the documents that were created. In this case, it is outputting movie name and a value of the runtime, rather than a map, but conceptually this is the same. A movie's value could be a map if necessary. Also note that even though the output is similar to a document, the same uniqueness requirement doesn't apply. With views, everything that you emit as a key (the first item in the inner vector) is internally paired with the ID of the document that the key came from. You can see that ID in Listing 20's view output. The output is what you expected, showing all of the runtimes of the movies in the database.

There are a few problems with the approach I've just shown to running the view. First, it's temporary, intended for development. It will reexamine each document in the database every time it is executed, even if those documents haven't changed since the last time it ran. By "revisit each document," I mean that each document is passed into the function and the result added to the output map. Second, you're confined to running the query only via your Clojure code.

To fix both of these issues, you can persist the view, as shown in Listing 21:

Listing 21. Storing a CouchDB view via Clutch
user> (clutch/with-db movies-db
        (clutch/save-view "movies" "runtimes"
          (clutch/with-clj-view-server
            {:map (fn [doc] (when (and (:movie-title doc)
                                      (:runtime doc))
                             [[(:movie-title doc)
                               (:runtime doc)]]))})))
{:_id "_design/movies",
 :language "clojure",
 :views {"runtimes" ...}}

user> (clutch/with-db movies-db
        (clutch/get-view "movies" "runtimes"))
{:total_rows 6,
 :rows [{:id "d6993381eb5ede34fded2f018b9f10b0",
         :key "Psycho",
         :value 105}
        {:id "84bbfce1b0e4cf6c9aa2f4196909f39d",
         :key "Psycho",
         :value 109}
         ...]}

By persisting the view, you save the results of the query the first time you run it (updating only when documents change). And you make it available for all users to execute the query (from Clojure, a web browser, another language, and so on). The code in Listing 21 returns the same results as Listing 20 but is smarter about caching and can be reused via other languages, Futon, or the browser.


CouchDB views — in depth

Using views gives you clear performance gains over querying the data manually. Earlier in this article, you only queried the database by document key, having found the keys either by getting a document list or (when the database was keyed by movie title) knowing the key ahead of time. If you know the ID of the specific document you're looking for, the query performs quickly. If you don't (the more likely scenario), it is slow. The movies database contains only a few documents, so even temporary views return quickly. In databases with thousands or hundreds of thousands of documents, running the map function over every document can be time-consuming. Storing the results is essential for them to be useful.

To save a view using Clutch, Listing 21 used the save-view function, passing it two strings and a Clojure map with a single key/value pair of map with a function for the value. Clutch does a good job abstracting away the mundane details of saving the view document. These views actually are stored as regular CouchDB documents but with special names.

Listing 22 is an example of creating a view document using clj-http:

Listing 22. Storing a view with clj-http
user> (->> {:language "clojure"
            :views {:runtimes
                     {:map "(fn [doc]
                        (when (and (:movie-title doc)
                                   (:runtime doc))
                          [[(:movie-title doc)
                            (:runtime doc)]]))"}}}
            json/json-str
            (hash-map :body)
            (client/put (str movies-db "/_design/movies/")))
{:status 201
 ...
 :body "{\"ok\":true...}\n"}

user> (-> (str movies-db "/_design/movies/_view/runtimes")
          client/get
          :body
          json/read-json
          :rows)
[{:id "d6993381eb5ede34fded2f018b9f10b0",
  :key "Psycho",
  :value 105}
  ...]

Notice a few interesting things about the code in Listing 22. First, it's a CouchDB document just like everything else you've stored in CouchDB thus far. What makes it different is that it is stored in a specially named document, _design/movies in this case. CouchDB design documents are CouchDB documents that contain views. Their names begin with _design. Design documents have a language property (in this case Clojure) and a views property that contains a map of specific views found in the design document. When Listing 21 calls save-view via Clutch, the first two parameters define the design document and the name of the view. This view section of the map is meant to house many related queries. The runtimes view has a map associated with it that looks similar to the original function you defined via the Clutch API. You have now created a view, by creating a specially named CouchDB document. The second part of Listing 22 gets the results of the view using the special URLs.

Querying items in a view

The preceding examples show how to retrieve all of the results returned by a view. This is useful, but it might not be what you want. Take the original reasoning behind selecting movie title as the key for the database. It seems reasonable to want to query the movies database by movie title, even if movie title is not unique. With that goal in mind, you can create a view that returns the full document about the movie, keyed by movie title. This is similar to the code you've already seen, except this time rather than return a single number, you'll return the full document. Listing 23 shows the code for creating and querying a view by movie title:

Listing 23. Querying by movie title with Clutch
user> (clutch/with-db movies-db
        (clutch/save-view "movies" "by_title"
          (clutch/with-clj-view-server
            {:map (fn [doc] (when (and (:movie-title doc)
                                      (:runtime doc))
                             [[(:movie-title doc)
                               doc]]))})))

user> (clutch/with-db movies-db
        (:rows (clutch/get-view "movies" "by_title" {:key "Psycho"})))
[{:id "d6993381eb5ede34fded2f018b9f10b0",
  :key "Psycho",
  :value {:_id "d6993381eb5ede34fded2f018b9f10b0",
          :movie-title "Psycho",
          :director "Gus Van Sant",
          ...}
  ...}]

The only difference between querying a specific movie and querying all movies is a query parameter. The Clutch code in Listing 23 uses a query parameter map, which just results in a query string in the URL, as in the clj-http query in Listing 24:

Listing 24. Querying by movie title with clj-http
user> (->> "\"Psycho\""
            java.net.URLEncoder/encode
            (str movies-db "/_design/movies/_view/by_title?key=")
            client/get
            :body
            json/read-json
            :rows)
[{:id "d6993381eb5ede34fded2f018b9f10b0",
  :key "Psycho",
  :value {...}
  ...}]

CouchDB has a lot of query options, including sorting in ascending/ descending order, key ranges, and limits. Keys can also be other JSON structures such as a list or a map. For more information on CouchDB views, see Resources.

Reduce functions

Querying data through views using only map functions probably covers most developer needs. But sometimes you might want to get aggregate information. Averages, summations, and other summary types of data are not possible with just a map function. CouchDB provides a reduce function for this purpose. Listing 25 shows an example of creating a view that shows the total number of movies in the database for a given studio:

Listing 25. View with a reduce function
user> (clutch/with-db movies-db
        (clutch/save-view "movies" "studio"
          (clutch/with-clj-view-server
            {:map (fn [doc] (when (:studio doc)
                             [[(:studio doc) 1]]))
             :reduce (fn [keys vals rereduce]
                       (if rereduce
                         (apply + vals)
                         (count vals)))})))

user>  (clutch/with-db movies-db
        (clutch/get-view "movies" "studio"))
{:rows [{:key nil, :value 6}]}


user> (clutch/with-db movies-db
        (clutch/get-view "movies" "studio" {:key "Paramount"}))
{:rows [{:key nil, :value 3}]}

The reduce function shown in Listing 25 takes three arguments:

  • The first, keys, is a list of keys that were created in the map function. Note that the list of keys consists not just of the studio emitted in Listing 25, but of the studio and the ID.
  • The second argument, vals, is a list of values for the keys that were passed into the function. In this case, it's the series of ones that are emitted.
  • The third argument, rereduce, has to do with whether or not this reduce function is operating on aggregate information or the raw results from the map (the ones from the map function).

Some knowledge of how CouchDB stores these results is necessary to understand this reduce function. The results of the reduce call are stored in a B-Tree; the closer to the root the data is in the B-Tree, the higher-level the summary. The first call to the studio view returns 6. This is the view from the root of this tree. If you emitted the keys at this point, you'd also see a pair of [studio doc-id]s for each document in the database. As you travel further from the root down the tree, other summaries (that are less than the summary in the root) can be emitted. The second call in Listing 25 to the view asks for "Paramount" studio movies. This will traverse the tree (in logarithmic time) and find the closest to the root node's summation of studio movies. This structure is aimed at performance. When data changes or the values need to be computed, the summaries can be used without necessarily rerunning all of the computations. This structure is also what's behind the rereduce parameter. The rereduce parameter is true when the children of the node being computed have already been computed (and in this example are not just the individual 1 values).


CouchDB replication

CouchDB can scale to many instances and incrementally replicate between the various nodes in the cluster. This is all built on top of CouchDB replication abilities, which I have found useful even outside of scaling distributed CouchDB databases. Replicating data in CouchDB is a one-step process from an API perspective. Replication can take place on local databases, remote databases, or any combination thereof. CouchDB makes this easy by giving you the ability to replicate at a database level using the same REST interface you've been using thus far. I'll focus here on CouchDB's replication support and using it programatically, rather than on how it can be applied to scaling CouchDB. (For more information on scaling CouchDB, see Resources.)

CouchDB replication occurs between two existing (local or remote) databases. There are no lineage requirements between the two databases prior to replication. Replication can be done via the Futon application, or programatically. Let's say you're suffering some performance problems and need to add a second instance of your CouchDB database to keep up with demand. To make for easy testing, you can just replicate to another local database, as shown in Listing 26:

Listing 26. Clutch replication example
user> (def moviesb-db "http://localhost:5984/movies-b")
#'user/moviesb-db

user> (clutch/create-database moviesb-db)
{:ok true,...}

user> (clutch/replicate-database "movies" "movies-b")
{:ok true...}

user> (let [movie-ids (clutch/with-db movies-db
                        (->> (clutch/get-all-documents-meta)
                             :rows
                             (map :id)))
           movie-b-ids (clutch/with-db moviesb-db
                         (->> (clutch/get-all-documents-meta)
                              :rows
                              (map :id)))]
         (= movie-ids movie-b-ids))
true

The code in Listing 26 creates a new (empty) database called movies-b (because both databases can't have the same name) and then replicates the movies database to it . Then it gets all of the document IDs and ensures that they are equal. Because you've just replicated, they should be. Listing 27 is the same example using clj-http:

Listing 27. clj-http replication example
user> (def moviesb-db "http://localhost:5984/movies-b")
#'user/moviesb-db

user> (client/put moviesb-db)
{:ok true,...}

user> (->> {:source "movies" :target "movies-b"}
           json/json-str
           (hash-map :body)
           (client/post "http://localhost:5984/_replicate"))
{:ok true...}

user> (let [movie-ids (->> (str movies-db "/_all_docs")
                           client/get
                           :body
                           json/read-json
                           :rows
                           (map :id))
           movie-b-ids (->> (str moviesb-db "/_all_docs")
                            client/get
                            :body
                            json/read-json
                            :rows
                            (map :id))]
           (= movie-ids movie-b-ids))
true

In Listing 27, you're posting a JSON document that contains a source and destination. CouchDB takes it from there. You now have the advantages and drawbacks of two copies of the data. With the added muscle of a second database, CouchDB services requests faster, but what happens when the databases change? As an example, add a new movie to each database and replicate between them, as shown in Listing 28:

Listing 28. Adding new documents to both databases
user>  (clutch/with-db movies-db
         (clutch/create-document
           {:movie-title "Vertigo"
            :director "Alfred Hitchcock",
            :runtime 128,
            :year-released 1958,
            :studio "Paramount Pictures"}))
{:_id "728b2293180e0be566cea3f3127b6cf3"...}

user> (clutch/with-db moviesb-db
        (clutch/create-document
          {:movie-title "North by Northwest"
           :director "Alfred Hitchcock",
           :runtime 131,
           :year-released 1959,
           :studio "MGM"}))
{:_id "386d0400e336e54933a47aec656289c4"...}

(clutch/replicate-database "movies" "movies-b")
(clutch/replicate-database "movies-b" "movies")

After the first two statements in Listing 28, the two copies of the database differ. The movies database has a Vertigo document that movies-b doesn't have, and movies-b has North by Northwest document that movies doesn't have. These two documents have different (generated) IDs and represent different movies. Because the documents are different, replication completes without issues. Note that replication is only one way. When you replicate from movies to movies-b, none of the documents from movie-b has been moved. So you need to replicate from movies-b back to movies as well (to get the copy of the North by Northwest document). To verify that the new documents exist in both databases, you can reuse the code from Listing 27 that compares the document lists of two databases.

Resolving replication conflicts

The scenario I just described is the happy path. It deals with only new documents, without any conflicts. Another happy path is replication of documents that have been updated on the source but not the destination. What happens when a document is modified in two different databases and then replicated? This is a scenario that can cause wrinkles in replication. The easiest way to handle this problem is to design the database to avoid conflicting updates. In document-oriented database design, it's best to make the documents as self-contained as possible. Can the database also be designed so that new information is put into new documents, so that document updates can be avoided? This is not always possible, but when it is, it makes replication much easier. The example in Listing 29 adds a new key to a document on each database — one with a rereleased year and the other with information on the movie's sound mixing:

Listing 29. Conflicting replication
user> (clutch/with-db movies-db
        (-> (clutch/get-document "386d0400e336e54933a47aec656289c4")
            (clutch/update-document {:re-released 1996})))
{:re-released 1996
 :movie-title "North by Northwest"...}

user> (clutch/with-db moviesb-db
        (-> (clutch/get-document "386d0400e336e54933a47aec656289c4")
          (clutch/update-document  {:sound-mix "Mono"})))
{:sound-mix "Mono"
 :movie-title "North by Northwest"...}

user> (clutch/replicate-database "movies" "movies-b")
{:ok true...}

Everything appears to have replicated properly, until I check the results in Listing 30:

Listing 30. Checking conflicted replication
user> (clutch/with-db moviesb-db
        (keys (clutch/get-document "386d0400e336e54933a47aec656289c4")))
 (:movie-title :director :_conflicts :_rev :language
    :runtime :studio :_id :sound-mix :year-released)

The update from movies looks like it disappeared because there is no re-released key. What happened here is that when movies was replicated with movies-b, a conflict occurred. This happened because an attempt was made to replicate differing revisions of the same document. When a conflict like this occurs with CouchDB, no information is lost. Rather, Couch creates a new conflict record associated with the document that can be retrieved with the Clutch query shown in Listing 31:

Listing 31. Examining a conflict (Clutch example)
user> (clutch/with-db moviesb-db
        (clutch/get-document "386d0400e336e54933a47aec656289c4" {:conflicts true}))
{:movie-title "North by Northwest",
 ...
 :_conflicts ["2-ac7e4d143dff32f7be437de99a659ba1"]
 ...}

Listing 32 shows the equivalent clj-http code:

Listing 32. Examining a conflict (clj-http example)
user> (-> (str moviesb-db "/386d0400e336e54933a47aec656289c4?conflicts=true")
          client/get
          :body
          json/read-json)
{:movie-title "North by Northwest"
 ...
 :_conflicts ["2-ac7e4d143dff32f7be437de99a659ba1"]
 ...}

Listing 31 and Listing 32 show that a conflict occurred and that the conflict is with a particular revision (in this case, 2-ac7e4d143dff32f7be437de99a659ba1) of the document. You can then pull that specific conflicting revision and update it yourself. Listing 33 does this with Clutch:

Listing 33. Getting conflicting document (Clutch)
user> (clutch/with-db moviesb-db
        (keys (clutch/get-document "386d0400e336e54933a47aec656289c4"
                                   {:rev "2-ac7e4d143dff32f7be437de99a659ba1"}
                                   #{:rev})))
(:_id :_rev :re-released :movie-title :director
 :runtime :year-released :studio)

Listing 34 shows the equivalent clj-http code:

Listing 34. Getting conflicting document (clj-http)
user> (->  (str moviesb-db
                "/386d0400e336e54933a47aec656289c4"
                "?rev=2-ac7e4d143dff32f7be437de99a659ba1")
           client/get
           :body
           json/read-json
           keys)
(:re-released :movie-title :director :_rev :language
 :runtime :studio :_id :year-released)

Note that the keys list does not contain the sound-mix key/value pair but does contain the re-released key/value pair. The burden is on the developer for resolving these conflicts. Also, the same rule as in the Updating a CouchDB document section applies here. Any change to the document causes a new revision ID and can create a conflict. Even changes to different areas of a document will not be automatically resolved. The steps to resolve this issue are:

  1. Read the current document.
  2. Read the older (conflicting) version.
  3. Apply domain specific merge logic.
  4. Update document to new (merged) version.
  5. Remove conflicting document version.

Step 5 is the same as deleting any other document, with the additional revision parameter mirroring how the document was retrieved.

The key point about replication conflicts is that handling the errors is domain-specific. In the case of this example, you want to merge together updates, combining the alternate title fields. There are many alternatives to this logic. Another option is always to take the most recent update. This would make sense for documents such as movie revenue, because you'd likely want only the most recent revenue information for a movie. In other cases, you might always want the first update to win.


Conclusion

The simple JSON document format, the REST APIs, and Clutch's nice Clojure support add up to a compelling case for using Clojure to access CouchDB. The ability to write CouchDB views in Clojure means you have one less language to support in an application and more continuity in the code. The abstraction that Clutch provides, coupled with a solid understanding of the REST fundamentals of CouchDB, can lead to a rapidly developed and maintainable CouchDB-based application.

Resources

Learn

Get products and technologies

Discuss

  • Get involved in the developerWorks community. Connect with other developerWorks users while exploring the developer-driven blogs, forums, groups, and wikis.

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Java technology on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Java technology, Open source
ArticleID=628252
ArticleTitle=Using CouchDB with Clojure
publish-date=02222011