Explore MongoDB

Learn why this database management system is so popular

In this article, you will learn about MongoDB, the open source, document-oriented database management system written in C++ that provides features for scaling your databases in a production environment. Discover what benefits document-oriented databases have over traditional relational database management systems (RDBMS). Install MongoDB and start creating databases, collections, and documents. Examine Mongo's dynamic querying features, which provide key/value store efficiency in a way familiar to RDBMS database administrators and developers.

Joe Lennon, Lead Mobile Developer, Core International

Joe Lennon photoJoe Lennon is a software developer from Cork, Ireland. He works as a Web application and Oracle PL/SQL developer for Core International, having graduated from University College Cork in 2007 with a degree in business information systems.



21 June 2011

Also available in Japanese Portuguese

What is MongoDB?

In recent years, we have seen a growing interest in database management systems that differ from the traditional relational model. At the heart of this is the concept of NoSQL, a term used collectively to denote database software that does not use the Structured Query Language (SQL) to interact with the database. One of the more notable NoSQL projects out there is MongoDB, an open source document-oriented database that stores data in collections of JSON-like documents. What sets MongoDB apart from other NoSQL databases is its powerful document-based query language, which makes the transition from a relational database to MongoDB easy because the queries translate quite easily.

MongoDB is written in C++. It stores data inside JSON-like documents (using BSON — a binary version of JSON), which hold data using key/value pairs. One feature that differentiates MongoDB from other document databases is that it is very straightforward to translate SQL statements into MongoDB query function calls. This makes is easy for organizations currently using relational databases to migrate. It is also very straightforward to install and use, with binaries and drivers available for major operating systems and programming languages.

MongoDB is an open-source project, with the database itself licensed under the GNU AGPL (Affero General Public License) version 3.0. This license is a modified version of the GNU GPL that closes a loophole where the copyleft restrictions do not apply to the software's usage but only its distribution. This of course is important in software that is stored on the cloud and not usually installed on client devices. Using the regular GPL, one could perceive that no distribution is actually taking place, and thus potentially circumvent the license terms.

The AGPL only applies to the database application itself, and not to other elements of MongoDB. The official drivers that allow developers to connect to MongoDB from various programming languages are distributed under the Apache License Version 2.0. The MongoDB documentation is available under a Creative Commons license.

Document-oriented databases

Document-oriented databases are quite different from traditional relational databases. Rather than store data in rigid structures like tables, they store data in loosely defined documents. With relational database management systems (RDBMS) tables, if you need to add a new column, you need to change the definition of the table itself, which will add that column to every existing record (albeit with potentially a null value). This is due to RDBMS' strict schema-based design. However, with documents you can add new attributes to individual documents without changing any other documents. This is because document-oriented databases are generally schema-less by design.

Another fundamental difference is that document-oriented databases don't provide strict relationships between documents, which helps maintain their schema-less design. This differs greatly from relational databases, which rely heavily on relationships to normalize data storage. Instead of storing "related" data in a separate storage area, in document databases they are embedded in the document itself. This is much faster than storing a reference to another document where the related data is stored, as each reference would require an additional query.

This works extremely well for many applications where it makes sense for the data to be self-contained inside a parent document. A good example (which is also given in MongoDB documentation) is blog posts and comments. The comments only apply to a single post, so it does not make sense to separate them from that post. In MongoDB, your blog post document would have a scomments attribute that stores the comments for that post. In a relational database you would probably have a comments table with an ID primary key, a posts table with an ID primary key and an intermediate mapping table post_comments that defines which comments belong to which post. This is a lot of unnecessary complexity for something that should be very straightforward.

However, if you must store related data separately you can do so easily in MongoDB using a separate collection. Another good example is that you store customer order information in the MongoDB docs. This can typically comprise information about a customer, the order itself, line items in the order, and product information. Using MongoDB, you would probably store customers, products, and orders in individual collections, but you would embed line item data inside the relevant order document. You would then reference the products and customers collections using foreign key-style IDs, much like you would in a relational database. The simplicity of this hybrid approach makes MongoDB an excellent choice for those accustomed to working with SQL. With that said, take time and care to decide on the approach you need to take for each individual use case, as the performance gains can be significant by embedding data inside the document rather than referencing it in other collections.

Features at a glance

MongoDB is a lot more than just a basic key/value store. Let's take a brief look at some of its other features:

  • Official binaries available for Windows®, Mac OS X, Linux® and Solaris, source distribution available for self-build
  • Official drivers available for C, C#, C++, Haskell, Java™, JavaScript, Perl, PHP, Python, Ruby and Scala, with a large range of community-supported drivers available for other languages
  • Ad-hoc JavaScript queries that allow you to find data using any criteria on any document attribute. These queries mirror the functionality of SQL queries, making it very straightforward for SQL developers to write MongoDB queries.
  • Support for regular expressions in queries
  • MongoDB query results are stored in cursors that provide a range of functions for filtering, aggregation, and sorting including limit(), skip(), sort(), count(), distinct() and group().
  • map/reduce implementation for advanced aggregation
  • Large file storage using GridFS
  • RDBMS-like attribute indexing support, where you can create indexes directly on selected attributes of a document
  • Query optimization features using hints, explain plans, and profiling
  • Master/slave replication similar to MySQL
  • Collection-based object storage, allowing for referential querying where normalized data is required
  • Horizontal scaling with auto-sharding
  • In-place updates for high-performance contention-free concurrency
  • Online shell allows you to try out MongoDB without installing
  • In-depth documentation, several books published and currently in writing

Installing MongoDB

Fortunately, MongoDB is very straightforward to install on a wide variety of platforms. Binary distributions are available for Windows, Mac OS X, Linux, and Solaris, while various package managers provide easy installation and setup options for other systems. If you're brave enough, you can compile the source code for yourself. In this section, you will learn how to install MongoDB on Windows and Mac OS X, setting the process up as a service on Windows or as a daemon on OS X.

Installing on Windows

Installation of MongoDB on Windows is very straightforward. In your favorite web browser, navigate to http://www.mongodb.org/downloads and download the latest stable production release for Windows. The 64-bit version is recommended, but can only be used if you are using the 64-bit version of the Windows operating system. If you're unsure, just use the 32-bit version.

Extract the zip file to the C:\ drive, which will create a new folder with a name like mongodb-win32-i386-1.6.4. To make your life easier, rename this folder to mongo. Next, you need to create a data directory. In Windows Explorer, go to the root of the C:\ drive and create a new folder named data. Inside this folder, create a new folder named db.

You can now start the MongoDB server. Use Windows Explorer to navigate to C:\mongo\bin and double-clicking mongod.exe. Closing the command prompt window that opens will stop the MongoDB server. As a result, it is more convenient to set up the MongoDB server as a service that is Windows controls. Let's do that now.

Open a command prompt window (Start>Run>, enter cmd and press OK) and issue the commands in Listing 1.

Listing 1. Setting up the MongoDB server as a service
> cd \mongo\bin
> mongod --install --logpath c:\mongo\logs --logappend 
--bind_ip 127.0.0.1 --directoryperdb

You should see the output in Listing 2.

Listing 2. Service created successfully
all output going to c:\mongo\logs
Creating service MongoDB.
Service creation successful.
Service can be started from the command line via 'net start "MongoDB"'.

With Mongo installed as a service, you can now start it with the following command: > net start "MongoDB"

You should see the output in Listing 3.

Listing 3. Mongo started successfully
The Mongo DB service is starting.
The Mongo DB service was started successfully.

You can now run the MongoDB shell client. If you have a command prompt window open, make sure you are in the c:\mongo\bin folder and enter the following command: > mongo.

Alternatively, in Windows Explorer navigate to C:\mongo\bin and double-click on mongo.exe. Whichever way you choose to start the shell, you should see a prompt as in Listing 4.

Listing 4. Starting the shell
MongoDB shell version: 1.8.1
connecting to: test
>

Unless you also want to set up MongoDB on a Mac OS X machine, you can now skip the next part of this section and move on to "Getting started", where you will learn how to interact with the MongoDB server using the shell client.

Installing on Mac OS X

Assuming you are using a 64-bit version of Mac OS X, the following steps detail how to download the 64-bit OS X binary of MongoDB, extract it and configure it to get started. It will also show you how to run MongoDB as a daemon.

First, launch Terminal (Applications>Utilities>Terminal). In the Terminal window, run the commands in Listing 5.

Listing 5. Setting up MongoDB on Mac OS X
$ cd ~
$ curl http://fastdl.mongodb.org/osx/mongodb-osx-x86_64-1.6.4.tgz > mongo.tgz
$ tar xzf mongo.tgz
$ mv mongodb-osx-x86_64-1.8.1/ mongo
$ mkdir -p /data/db

MongoDB is now set up and ready to use. Before going any further, it might be good to add MongoDB to your path. Execute the following command: $ nano ~/.bash_profile.

This file may not exist yet. In any case, add the following line: export PATH={$PATH}:~/mongo/bin.

Save the file by pressing ctrl + O and then hit Enter at the prompt. Then press ctrl + X to exit nano. Now, reload your bash profile with the following command: $ source ~/.bash_profile.

You are now ready to startup MongoDB. To start it, simply issue the following command: $ mongod.

This will start the MongoDB database server as a foreground process. If you'd prefer to start MongoDB as a daemon process in the background, issue the following command instead: $ sudo mongod --fork --logpath /var/log/mongodb.log --logappend.

You will be asked to enter a password; enter your Mac OS X administrator password at this prompt.

Regardless of which method you chose to start MongoDB, the server should now be running. If you started it in the foreground, you will need a separate Terminal tab or window to start the client. To start the client, you simply use the command: $ mongo

You should see the prompt in Listing 6.

Listing 6. Staring the client
MongoDB shell version: 1.8.1
connecting to: test
>

In the next section, you will learn how to use the MongoDB shell to create databases, collections, documents, and so on.


Getting started using MongoDB

Included with the MongoDB distribution is a shell application that allows you complete control over your databases. Using the shell, you can create and manage databases, collections, documents, and indexes using server-side JavaScript functions. This makes it easy to get up and running with MongoDB quickly. In this section, you will learn how to start the shell and see examples of some basic commands to do basic data storage and retrieval.

The MongoDB shell

The MongoDB shell application is included with the MongoDB distribution in the bin folder. On Windows, this is in the form of the application mongo.exe. Double-clicking this program in Windows Explorer will start the shell. In UNIX®-based operating systems (including Mac OS X) you can start the MongoDB shell by executing the mongo command in a terminal window (assuming you followed the instructions above to add the MongoDB directory to your path).

When you first launch the shell, you should see the message in Listing 7.

Listing 7. Message after launching the shell
MongoDB shell version: 1.8.1
connecting to: test
>

You are now connected to your local MongoDB server, and in particular, the "test" database. In the next section, you will learn how to create databases, documents, and collections. If at any stage you are looking for some help, you can simply issue the command "help" to the Mongo shell prompt. Figure 1 shows the typical output of a help command.

Figure 1. Output from Mongo shell help command
output from Mongo shell help command

If you ever want to see the source code behind a MongoDB function, simply type the name of that function in the shell, and it will print the JavaScript source. For example, type connect and hit the return key, and you will see the source code used to connect to a MongoDB database.

Creating databases, collections, and documents

By default, the Mongo shell connects to the "test" database. To switch to a different database, you use the "use dbname" command. If the database does not exist, MongoDB will create it as soon as you add any data to it. Let's switch to the "mymongo" database with the following command: > use mymongo.

The shell should return the message: switched to db mymongo.

At this point, the database still doesn't really exist, as it doesn't contain any data. In MongoDB, data is stored in collections, allowing you to separate documents if required. Let's create a document and store it in a new collection named "colors": > db.colors.save({name:"red",value:"FF0000"});.

Let's verify that the document has been stored by querying the database: > db.colors.find();.

You should see a response similar to the following (the _id attribute is a unique identifier and will more than likely be different in your result): { "_id" : ObjectId("4cfa43ff528bad4e29beec57"), "name" : "red", "value" : "FF0000" }.

Documents in MongoDB are stored as BSON (binary JSON). Using the Mongo shell, we can insert data using a JSON-like syntax where each document is an object of key-value pairs. In this example, we created a document with two attributes: name and value, which have values of red and FF0000 (the hexadecimal representation of the standard red color), respectively.

As you may have notice, you did not need to predefine the colors collection, this is automatically done when you insert an item using the save function.

In this example, you created a very simple document. However, the JSON-like syntax used can be used to create documents that are more complex. Consider the following JSON document, which represents a purchase order or invoice (See Listing 8).

Listing 8. Creating a simple document
{
    order_id: 109384,
    order_date: new Date("12/04/2010"),
    customer: {
        name: "Joe Bloggs",
        company: "XYZ Inc.",
        phone: "(555) 123-4567"
    },
    payment: {
        type: "Cash",
        amount: 4075.99,
        paid_in_full: true
    },
    items: [
        {
            sku: "ABC1200",
            description: "A sample product",
            quantity: 1,
            price_per_unit: 75.99,
        }, {
            sku: "XYZ3400",
            description: "An expensive product",
            quantity: 2,
            price_per_unit: 2000
        }
    ],
    cashier_id: 340582242
}

As you can see, these documents can store various data types include strings, integers, floats, dates, objects, arrays and more. In Listing 8, the order items have been embedded directly into the order document, making it faster to retrieve this information when querying on the document later.

Because the MongoDB shell uses JavaScript, you can write regular JavaScript constructs when interacting with your database. Take Listing 9, which creates a collection of character documents, each containing the string representation of the character and its associated ASCII code.

Listing 9. Creating a collection of character documents
> var chars = "abcdefghijklmnopqrstuvwxyz"
> for(var i =0; i<chars.length; i++) {    
... var char = chars.substr(i, 1);          
... var doc = {char:char, code: char.charCodeAt(0)};
... db.alphabet.save(doc);
... }

This loop will create 26 documents, one for each lowercase letter of the alphabet, each document containing the character itself and its ASCII character code. In the next section, you will see how to retrieve this data in various ways.

Retrieving data

In the last section, you not only learned how to insert data into a MongoDB database, but you in fact also learned how to use the most basic data retrieval function, find. Let's start by using the find command on the alphabet collection we created at the end of the previous section: db.alphabet.find();.

This should generate a response like Listing 10.

Listing 10. Generated response
> db.alphabet.find()
{ "_id" : ObjectId("4cfa4adf528bad4e29beec8c"), "char" : "a", "code" : 97 }
{ "_id" : ObjectId("4cfa4adf528bad4e29beec8d"), "char" : "b", "code" : 98 }
{ "_id" : ObjectId("4cfa4adf528bad4e29beec8e"), "char" : "c", "code" : 99 }
{ "_id" : ObjectId("4cfa4adf528bad4e29beec8f"), "char" : "d", "code" : 100 }
{ "_id" : ObjectId("4cfa4adf528bad4e29beec90"), "char" : "e", "code" : 101 }
{ "_id" : ObjectId("4cfa4adf528bad4e29beec91"), "char" : "f", "code" : 102 }
{ "_id" : ObjectId("4cfa4adf528bad4e29beec92"), "char" : "g", "code" : 103 }
{ "_id" : ObjectId("4cfa4adf528bad4e29beec93"), "char" : "h", "code" : 104 }
{ "_id" : ObjectId("4cfa4adf528bad4e29beec94"), "char" : "i", "code" : 105 }
{ "_id" : ObjectId("4cfa4adf528bad4e29beec95"), "char" : "j", "code" : 106 }
{ "_id" : ObjectId("4cfa4adf528bad4e29beec96"), "char" : "k", "code" : 107 }
{ "_id" : ObjectId("4cfa4adf528bad4e29beec97"), "char" : "l", "code" : 108 }
{ "_id" : ObjectId("4cfa4adf528bad4e29beec98"), "char" : "m", "code" : 109 }
{ "_id" : ObjectId("4cfa4adf528bad4e29beec99"), "char" : "n", "code" : 110 }
{ "_id" : ObjectId("4cfa4adf528bad4e29beec9a"), "char" : "o", "code" : 111 }
{ "_id" : ObjectId("4cfa4adf528bad4e29beec9b"), "char" : "p", "code" : 112 }
{ "_id" : ObjectId("4cfa4adf528bad4e29beec9c"), "char" : "q", "code" : 113 }
{ "_id" : ObjectId("4cfa4adf528bad4e29beec9d"), "char" : "r", "code" : 114 }
{ "_id" : ObjectId("4cfa4adf528bad4e29beec9e"), "char" : "s", "code" : 115 }
{ "_id" : ObjectId("4cfa4adf528bad4e29beec9f"), "char" : "t", "code" : 116 }
has more
>

By default, the find() function retrieved all of the documents in the collection, but displayed only the first 20 documents. Giving the command it will retrieve the remaining 6 documents (see Listing 11).

Listing 11. Retrieving the remaining 6 documents
> it
{ "_id" : ObjectId("4cfa4adf528bad4e29beeca0"), "char" : "u", "code" : 117 }
{ "_id" : ObjectId("4cfa4adf528bad4e29beeca1"), "char" : "v", "code" : 118 }
{ "_id" : ObjectId("4cfa4adf528bad4e29beeca2"), "char" : "w", "code" : 119 }
{ "_id" : ObjectId("4cfa4adf528bad4e29beeca3"), "char" : "x", "code" : 120 }
{ "_id" : ObjectId("4cfa4adf528bad4e29beeca4"), "char" : "y", "code" : 121 }
{ "_id" : ObjectId("4cfa4adf528bad4e29beeca5"), "char" : "z", "code" : 122 }
>

The find() function actually returns a cursor to the result set of the query, in this case, retrieve all documents. When this is not assigned to a variable or no further functions are performed, it will by default print a sample result set to the screen. To display all of the result set, we could have used the following command: > db.alphabet.find().forEach(printjson);.

This would print every record in the result set, rather than displaying a subset. We will see more about using cursors and queries to filter data next.


Querying data

One of MongoDB's greatest strengths is its powerful support for ad-hoc querying that works in much the same manner as a traditional relational databases, albeit filtering and returning BSON documents rather than table rows. This approach sets it apart from other document stores, which can often be difficult to get to grips with for SQL developers. With MongoDB, relatively complex SQL queries can be easily translated to JavaScript function calls. In this section, you will learn about the various functions available that allow you to query the data in MongoDB, and how to set up indexes to help optimize your queries, just as you would in the likes of DB2, MySQL or Oracle.

Basic queries

In the previous section, you learned how to use the find function to retrieve all documents. The find function accepts a series of arguments that allow you to filter the results that are returned. For example, in the alphabet collection we created previously, you could find any records where the "char" attribute has a value of "q" with the following command: > db.alphabet.find({char: "o"});.

This returns the following response: { "_id" : ObjectId("4cfa4adf528bad4e29beec9a"), "char" : "o", "code" : 111 }.

If you want to return all characters with a code less than or equal to 100, you could use the following command: > db.alphabet.find({code:{$lte:100}});.

This returns the result in Listing 12, as you might expect.

Listing 12. Result
{ "_id" : ObjectId("4cfa4adf528bad4e29beec8c"), "char" : "a", "code" : 97 }
{ "_id" : ObjectId("4cfa4adf528bad4e29beec8d"), "char" : "b", "code" : 98 }
{ "_id" : ObjectId("4cfa4adf528bad4e29beec8e"), "char" : "c", "code" : 99 }
{ "_id" : ObjectId("4cfa4adf528bad4e29beec8f"), "char" : "d", "code" : 100 }

MongoDB supports a variety of conditional operators, including:

  • $lt (less than)
  • $lte (less than or equal to)
  • $gt (greather than)
  • $gte (greater than or equal to)
  • $all (match all values in an array)
  • $exists (check if a field exists or does not exist)
  • $mod (modulus)
  • $ne (not equals)
  • $in (match one or more values in an array)
  • $nin (match zero values in an array)
  • $or (match one query or another)
  • $nor (match neither one query nor another)
  • $size (match any array with a defined number of elements)
  • $type (match values with a specified BSON data type)
  • $not (not equal to)

For more details on all of these operators, see the MongoDB documentation (see Resources for a link).

You can restrict the fields that are returned by your queries using a second argument in the find function. For example, the following query will only return the char attribute for any documents with a code value in the range 102 to 105: > db.alphabet.find({code:{$in:[102,103,104,105]}}, {char: 1});.

This should produce the result in Listing 13.

Listing 13. Result
{ "_id" : ObjectId("4cfa4adf528bad4e29beec91"), "char" : "f" }
{ "_id" : ObjectId("4cfa4adf528bad4e29beec92"), "char" : "g" }
{ "_id" : ObjectId("4cfa4adf528bad4e29beec93"), "char" : "h" }
{ "_id" : ObjectId("4cfa4adf528bad4e29beec94"), "char" : "i" }

In the next section, you will learn how to create indexes to speed up your queries.

Indexing

MongoDB indexes are quite similar to relational databases indexes. You can place an index on any attribute. In addition, indexed fields may be of any data type, including an object or an array. Like RDBMS indexes, you can create compound indexes using multiple attributes, and unique indexes, which ensure that duplicate values are not allowed.

To create a basic index, you use the ensureIndex function. Let's create an index on the code and char attributes in the alphabet collection now (see Listing 14).

Listing 14. Creating an index
> db.alphabet.ensureIndex({code: 1});
> db.alphabet.ensureIndex({char: 1});

You can drop indexes using the dropIndex and dropIndexes functions. See the MongoDB documentation for further information.

Sorting

To sort your result set, you can apply the sort function to your cursor. Our alphabet collection is already sorted in ascending order on both code and char attributes, so let's get a subset back in ascending order, sorted by the code attribute: > db.alphabet.find({code: {$gte: 118}}).sort({code: 0});.

This returns the result in Listing 15.

Listing 15. Result
{ "_id" : ObjectId("4cfa4adf528bad4e29beeca5"), "char" : "z", "code" : 122 }
{ "_id" : ObjectId("4cfa4adf528bad4e29beeca4"), "char" : "y", "code" : 121 }
{ "_id" : ObjectId("4cfa4adf528bad4e29beeca3"), "char" : "x", "code" : 120 }
{ "_id" : ObjectId("4cfa4adf528bad4e29beeca2"), "char" : "w", "code" : 119 }
{ "_id" : ObjectId("4cfa4adf528bad4e29beeca1"), "char" : "v", "code" : 118 }

If you supplied the argument {code: 1} to the sort function in the previous command, it would sort the results in ascending order. To ensure high performance queries, be sure to add an index to any attribute you sort your data using.


Paging results using skip and limit

Often when dealing with data result sets, you only want to retrieve a subset at a time, perhaps to provide paged results on a web page. In MySQL, you would typically do this using the LIMIT keyword. You can easily replicate this functionality in MongoDB using the skip and limit functions. To return the first 5 documents in the alphabet collection, you could perform the following operation: > db.alphabet.find().limit(5);.

This returns the result in Listing 16.

Listing 16. Result
{ "_id" : ObjectId("4cfa4adf528bad4e29beec8c"), "char" : "a", "code" : 97 }
{ "_id" : ObjectId("4cfa4adf528bad4e29beec8d"), "char" : "b", "code" : 98 }
{ "_id" : ObjectId("4cfa4adf528bad4e29beec8e"), "char" : "c", "code" : 99 }
{ "_id" : ObjectId("4cfa4adf528bad4e29beec8f"), "char" : "d", "code" : 100 }
{ "_id" : ObjectId("4cfa4adf528bad4e29beec90"), "char" : "e", "code" : 101 }

To get the next page, you would use the following command: > db.alphabet.find().skip(5).limit(5);.

As you can see in Listing 17, this fetches the next 5 records.

Listing 17. Fetching the next five records
{ "_id" : ObjectId("4cfa4adf528bad4e29beec91"), "char" : "f", "code" : 102 }
{ "_id" : ObjectId("4cfa4adf528bad4e29beec92"), "char" : "g", "code" : 103 }
{ "_id" : ObjectId("4cfa4adf528bad4e29beec93"), "char" : "h", "code" : 104 }
{ "_id" : ObjectId("4cfa4adf528bad4e29beec94"), "char" : "i", "code" : 105 }
{ "_id" : ObjectId("4cfa4adf528bad4e29beec95"), "char" : "j", "code" : 106 }

Group functions and aggregation

MongoDB's query engine also makes it very simple to apply aggregation and group functions on your data. These are analogous to their SQL counterparts. Arguably, the most widely used function is the count() function: > db.alphabet.find().count();.

This should return 26. You can count filtered queries just as easily: > db.alphabet.find({code: {$gte: 105}}).count();.

The above statement should return 18.

Another useful aggregate function is distinct. This is used to return a set of distinct values for an attribute. Our alphabet collection is a bad example as all the data is unique, so let's add a couple of records to the colors collection we created earlier in this article (see Listing 18).

Listing 18. Adding records to the color collection
> db.colors.save({name:"white",value:"FFFFFF"});
> db.colors.save({name:"red",value:"FF0000"});  
> db.colors.find();

Assuming you did not delete the colors collection, you should see the response in Listing 19.

Listing 19. Response
{ "_id" : ObjectId("4cfa43ff528bad4e29beec57"), "name" : "red", "value" : "FF0000" }
{ "_id" : ObjectId("4cfa5830528bad4e29beeca8"), "name" : "white", "value" : "FFFFFF" }
{ "_id" : ObjectId("4cfa5839528bad4e29beeca9"), "name" : "red", "value" : "FF0000" }

As you can see, there are clearly two red documents in this collection. Now, let's use the distinct function to get a set of unique name attribute values from this collection: > db.colors.distinct("name");.

This returns the following: [ "red", "white" ].

It's worth noting that you do not perform the distinct function on a cursor or result set as you do other query functions, but you perform it directly on the collection. You'll also note that it does not return a set of documents, but rather an array of values.

MongoDB also provides a group function for performing actions like you would do in a GROUP BY expression in SQL. The group function is a complex beast, so I only give a brief example here. For our example, let's say we want to count the number of documents grouped by the name value. In SQL, we could define this expression as SELECT name, COUNT(*) FROM colors GROUP BY name;.

To perform this query in MongoDB, you would use the command in Listing 20.

Listing 20. Using the group function
> db.colors.group(
... {key: {name: true},
... cond: {},
... initial: {count: 0},                                  
... reduce: function(doc, out) { out.count++; }
... });

This produces the result in Listing 21.

Listing 21. Result
[
    {
        "name" : "red",
        "count" : 2
    },
    {
        "name" : "white",
        "count" : 1
    }
]

If you need to perform advanced aggregation or use large data sets, MongoDB also includes an implementation of map/reduce, which will allow you to do so. The group function outlined above does not work in sharded MongoDB setups, so if you are using sharding, be sure to use map/reduce instead.

Updating existing data

In the MongoDB shell, it is very easy to update documents. In the colors collection we created earlier, we had two records for red. Let's say we want to take one of those records and change it to black, with the value attribute 000000 (the hexadecimal value of black). First, we can use the findOne function to retrieve a single item with the value red, change its properties as required, and save the document back to the database.

Get a single document with the name red and store it in the blackDoc variable: > var blackDoc = db.colors.findOne({name: "red"});.

Next, we use dot notation to alter the properties of the document (see Listing 22).

Listing 22. Altering the properties of the document
> blackDoc.name = "black";
> blackDoc.value = "000000";

Before saving, let's check that the document looks right (it should have an _id attribute, otherwise it will just insert a new record rather than saving over the red one): > printjson(blackDoc);.

If this returns something similar to Listing 23 you're ready to go.

Listing 23. Result
{
    "_id" : ObjectId("4cfa43ff528bad4e29beec57"),
    "name" : "black",
    "value" : "000000"
}

Finally, use the save function to save the document back to the colors collection in the database: > db.colors.save(blackDoc);.

We can now use the find function to make sure that our collection looks right: > db.colors.find();.

This should return something like Listing 24. If you have 4 records, you are doing it wrong.

Listing 24. Result
{ "_id" : ObjectId("4cfa43ff528bad4e29beec57"), "name" : "black", "value" : "000000" }
{ "_id" : ObjectId("4cfa5830528bad4e29beeca8"), "name" : "white", "value" : "FFFFFF" }
{ "_id" : ObjectId("4cfa5839528bad4e29beeca9"), "name" : "red", "value" : "FF0000" }

Outside of the Mongo shell, you would use the update function in your applications to apply changes to existing data. For more information on the update function, see the MongoDB documentation.

Deleting data

To delete data in MongoDB, you use the remove function. Please note that this applies to the MongoDB shell program, some drivers may implement a delete function or otherwise. Check the documentation for a specific implementation if required.

The remove function works in a similar way to the find function. To remove any documents in the colors collection that match the name white, you would use the following command: > db.colors.remove({name:"white"});.

You can then check that this document has been removed: > db.colors.find();.

If all is well, you should only see two documents (see Listing 25).

Listing 25. Deleting data
{ "_id" : ObjectId("4cfa43ff528bad4e29beec57"), "name" : "black", "value" : "000000" }
{ "_id" : ObjectId("4cfa5839528bad4e29beeca9"), "name" : "red", "value" : "FF0000" }

To remove all documents in a collection, simply omit the filter from your command, like the following: > db.colors.remove();.

Now when you try to use the find function, you won't get any response, signifying an empty result set: > db.colors.find();.

If you have a document stored in a variable, you can also pass this document to the remove function to delete it, but this is an inefficient way of doing so. You'd be better off finding the _id attribute of this document and passing that to the remove function instead.

To drop a collection, you can use the following command: > db.colors.drop();.

This returns the following: true.

You can now check that the collection has indeed been dropped using the show collections command. This should produce the output in Listing 26.

Listing 26. Using the show collections command
alphabet
system.indexes

Finally, if you wish to remove an entire database, you perform the following command: > db.dropDatabase();.

This deletes the currently selected database. You should see the following output: { "dropped" : "mymongo", "ok" : 1 }.

You can use the command show dbs to get a list of available databases. mymongo should not appear in this list.


Tools and other features

MongoDB includes a series of useful utilities for administering your database. It provides various means of importing and exporting data, either for reporting or backup purposes. In this section, you will discover how to import and export files in JSON format, as well as how to create hot backup files that are more efficient for recovery purposes. You will also learn about how you can use map/reduce functions as an alternative to Mongo's regular query functions for complex aggregation of data.

Importing and exporting data

MongoDB's bin directory contains a series of utilities for importing and exporting data in a variety of formats. The mongoimport utility allows you to supply a file with each line containing a document in JSON, CSV or TSV format and insert each of these documents into a MongoDB database. Because MongoDB uses BSON, if you are importing JSON documents you need to supply some modifier information if you wish to avail of any of BSON's additional data types that are not available in regular JSON.

The mongoexport utility allows you to produce a file output with every document in a MongoDB database represented in either JSON or CSV format. This is useful for producing reports where the application accepts either JSON or CSV data as an input. To produce a CSV file, you need to provide the fields in the order they should appear in the output file.

Backing up and restoring databases

The mongoimport and mongoexport utilities are useful for taking data out of MongoDB for use in other applications or importing from other applications that can make JSON or CSV data available. However, these utilities should not be used for taking periodical backups or a MongoDB database or restoring a MongoDB database. Because MongoDB uses BSON and not JSON or CSV, it is difficult to preserve data types when importing data from these formats.

To provide proper backup and restore functionality, MongoDB provides two utilities: mongodump and mongorestore. mongodump produces a binary file backup of a database, and mongorestore reads this file and restores a database using it, automatically creating indexes as required (unless you have removed the system.indexes.bson file from your backup directory).


Administration utilities

MongoDB also provides a web-based diagnostic interface; available at http://localhost:28017/ on default MongoDB configurations. This screen looks like the screenshot in Figure 2.

Figure 2. MongoDB diagnostics
MongoDB diagnostics

To get other administration information, you can also run the following commands in the MongoDB shell:

  • db.serverStatus();
  • db.stats();

If your MongoDB server crashes, you should repair the database to check for any corruption and perform some data compaction. You can run a repair by running mongod --repair at your OS command line, or alternatively using the command db.repairDatabase(); from the MongoDB shell. The latter command runs at a per-database level, so you would need to run this command for each database on the server.

You can also validate collection data using the validate function. If you have a collection named contacts, you could validate that collection with the command db.contacts.validate();

MongoDB features many other features to make the lives of DBAs easier. In addition a variety of third-party administration tools and interfaces are available. See the MongoDB documentation for more information.

map/reduce

If you have used the CouchDB database before, you are likely familiar with map/reduce, as the view engine uses map/reduce functions to filter and aggregate data by default. In MongoDB, this is not the case; simple queries and filtering (and even aggregation) do not rely on map/reduce. However, MongoDB does provide an implementation of map/reduce for use in aggregating large data sets.

map/reduce would likely warrant an article by itself. For detailed information on MongoDB's implementation of it, see the MongoDB documentation (see Resources for a link).


Scaling MongoDB

A primary reason for the recent popularity of key/value stores and document-oriented databases is their light footprint and tendency to be highly scalable. In order to facilitate this, MongoDB relies on the concepts of sharding and replication, which you will learn about in this section. In addition, you'll also learn how you can store large files in MongoDB using GridFS. Finally, you'll see how you can profile your queries to optimize the performance of your database.

Sharding

An important part of any database infrastructure is ensuring that it scales well. MongoDB implementations are scaled horizontally using an auto-sharding mechanism, allowing the scaling of a MongoDB configuration to thousands of nodes, with automatic load balancing, no single point of failure and automatic failover. It is also very straightforward to add new machines to a MongoDB cluster.

The beauty of MongoDB's auto-sharding features is that it makes it very straightforward to go from a single server to a sharded cluster, often with little or no changes to application code required. For detailed documentation on how auto-sharding works and how to implement it, see the MongoDB documentation.

Replication

MongoDB provides replication features in a master-slave configuration (similar to MySQL) for the purposes of failover and redundancy, ensuring a high level of consistency between nodes. Alternatively, MongoDB can use replica sets to define a node as a primary at any one time, with another node taking over as the primary in the event of a failure.

Unlike CouchDB, which uses replication as the basis for scaling, MongoDB uses replication primarily for ensuring high availability by using slave nodes as redundant replicas.

For further information on MongoDB replication, see the documentation (see Resources for a link).

Large file storage with GridFS

MongoDB databases store data in BSON documents. The maximum size of a BSON document is 4MB however, which makes them unsuitable for storing large files and objects. MongoDB uses the GridFS specification to store large files, by dividing the file into smaller chunks among multiple documents.

The standard MongoDB distribution includes command line utilities for adding and retrieving GridFS files to and from the local file system. In addition, all official MongoDB API drivers include support for GridFS. For more details, refer to the MongoDB documentation (see Resources).


Conclusion

In this article, you learned about the MongoDB database management system and why it is one of the fastest-growing options in the popular NoSQL section of the DBMS market. You learned about why you would choose a document-oriented database over a traditional RDBMS, and about the various great features that MongoDB has to offer. You learned how to install and use MongoDB for storage and retrieval of data, and about the various tools and scalability options it provides.

Resources

Learn

Get products and technologies

  • Download MongoDB.
  • IBM trial software: Innovate your next open source development project using trial software, available for download or on DVD.

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Open source on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Open source
ArticleID=680968
ArticleTitle=Explore MongoDB
publish-date=06212011