DB2 JSON capabilities, Part 2: Using the command-line processor

How to set up and use the DB2 JSON command-line interface

Rapidly changing application environments require a flexible mechanism to store and communicate data between different application tiers. JSON (Java Script Object Notation) has proven to be a key technology for mobile, interactive applications by reducing overhead for schema designs and eliminating the need for data transformations.

DB2® JSON enables developers to write applications using a popular JSON-oriented query language created by MongoDB to interact with data stored in IBM® DB2 for Linux®, UNIX®, and Windows® or IBM® DB2 for z/OS®. This driver-based solution embraces the flexibility of the JSON data representation within the context of a RDBMS, which provides established enterprise features and quality of service. This DB2 NoSQL capability supports a command-line processor, a Java API, and a wire listener to work with JSON documents.

In this article, you will set up a DB2 database to support JSON applications and walk through a scenario that introduces basic features of the JSON command-line processor to help you get started with your own applications.

Share:

Marion Behnen, DB2 JSON and Spatial Development, IBM

Marion Behnen is a Senior Software Engineer within the IBM Software Group and has been working on several NoSQL JSON features. Before delving into the world of NoSQL, she has been the technical lead for several DB2 and warehousing application components. Before joining IBM, she was involved with many aspects of business process and data integration, in particular related to database application development in the manufacturing industry.


developerWorks Contributing author
        level

Tony Sun, Staff Software Engineer, IBM

Tony Sun is a Staff Software Engineer within IBM's Software Group. Before working on IBM's DB2 NoSQL JSON solution, he worked on various DB2 Runtime products such as IBM PureQuery, and IBM Optim Performance Manager.



20 June 2013

Also available in Chinese Japanese

Before you start

DB2 JSON is available with DB2 for Linux, UNIX, and Windows 10.5 and with the IBM DB2 Accessories Suite for z/OS V3.1 to work with DB2 for z/OS V10. It provides the following features, as illustrated in Figure 1:

  • A command-line shell to administer and query JSON data.
  • A Java™ API for application development.
  • A wire listener to accept and respond to requests sent over the network.
Figure 1. DB2 JSON components
Applications use the wire listener or the Java API to work with JSON documents in the database. The Java API is also used by the command-line interface.

Learn what to expect from this article and how to get the most out of it

This article introduces basic features of DB2 to manage JSON data and execute queries in the command-line shell. It guides you through the following tasks:

  1. Setup
  2. Work with collections and documents
  3. Administration
  4. Cleanup

For a general overview, as well as details about the other components, see the other articles in this series.


Setup

System prerequisites

To complete the following steps, you must have at least IBM DB2 10.5 for Linux, UNIX, and Windows installed on the system, or use the IBM DB2 Accessories Suite for z/OS V3.1 to enable the use of DB2 for z/OS V10 as JSON datastore. Apply the following steps as applicable for your database platform.

IBM DB2 on Linux, Unix and Windows

To simplify managing authorization requirements for the examples in this article, make sure you have DBADM authority for the database you want to use. If you create the database as described in this section, the necessary authority is automatically assigned.

Create a database

In this step, a database is created that is suitable for the subsequent examples. Use a DB2 command window or your preferred DB2 administration tool to execute the command, as shown in Listing 1.

Listing 1. Command to create the database
CREATE DATABASE myjsondb 
       automatic storage yes 
       using codeset utf-8 territory US 
       collate using system  
       pagesize 32 k

Also, note the database server host name or IP address, and the port number. They are needed for the following steps in Preparing and starting the JSON command-line processing environment.

IBM DB2 for z/OS

Applications can use the DB2 JSON API or the DB2 JSON WireListener on a distributed system platform and use a DB2 Connect JDBC driver to use DB2 for z/OS as a JSON datastore.

The following components are required for this deployment scenario:

  • A DB2 10.5 FP3 (or later) for Linux, Unix, and Windows server or client installation with the DB2 JSON and DB2 Driver components. After installation these components are located in SQLLIB/json.
  • The DB2 Accessories Suite for z/OS V3.1.
  • DB2 for z/OS Version 10 with required APARs. See APAR II14727 for details about prerequisites and installation of DB2 JSON with DB2 for z/OS.

Catalog the database on the distributed system and note the connection information or the URL of the DB2 database on z/OS for the following step.

Preparing and starting the JSON command-line processing environment

The command-line processor is started with the db2nosql script (located in the <db2home>/sqllib/json/bin directory) and requires database connection information. To ensure that dependencies are resolved, verify the following:

  • A Java Runtime Environment (at minimum JRE 1.5) is included in the PATH.
  • The CLASSPATH includes the JDBC driver (db2jcc.jar version 3.66 or higher or db2jcc4.jar).

The script assumes the database is on localhost:50000 unless you specify a different location or port with the -hostName and -port options. Use the -help option for additional details, as shown in Listing 2.

Listing 2. Examples for using the db2nosql script
db2nosql -help

db2nosql -db bobdb  -user bob -password mypassword

db2nosql -hostName bob.bobhome.com -port 50003 -db bobdb  -user bob -password mypwd

Script argument prompting:

  • The database name is mandatory. The script will prompt for the database name, if the name is not provided.
  • If a username but no password is provided, the script will prompt for the password.
  • If neither a username nor a password are provided, the script will attempt a JDBC-Type2 connection, which uses the login user of the operating system.

Tip: Copy the script and edit the copy to provide hardcoded values to simplify starting the processing environment for a specific setup.

The JSON command-line processor will show a "nosql" prompt as shown in the sample excerpt in Listing 3.

Listing 3. Command-line processor
nosql>Type your JSON query and end it with <ENTER>
nosql>Type help() or help for usage information

Enable the DB2 database

For DB2 on Linux, Unix and Windows, run the script, and in the command-line processing environment, type enable(true) to prepare the database for DB2 JSON commands. This step will add the necessary system objects and is only required once for a database, as shown in Listing 4.

Listing 4. Enable the database
nosql>enable(true)
Executing SQL...
Database Artifacts created successfully
nosql>

Using the enable(false) option will print the associated DDL, but won't execute the command.

With DB2 for z/OS, enabling a database will be performed as a one-time installation task by a DB2 for z/OS Database or System Administrator. The enablement via the db2nosql command line tool is not supported for z/OS. See APAR PM98357 for details about enablement of DB2 JSON with DB2 for z/OS.

After the database is enabled, start the db2nosql script with the connection information or URL of the DB2 database on z/OS.

Select the JSON namespace

Using DB2 as a JSON document store allows the definition of multiple JSON namespaces by using DB2 SQL schemas as qualifiers. By default, the name TEST is set as JSON namespace, however, it is good practice to select a custom name with the command use. This JSON namespace (aka DB2 SQL schema name) will be used for the active session until a new JSON namespace is selected, as shown in Listing 5.

Listing 5. Show and set JSON namespaces
nosql>dbs
SCHEMAS: [TEST]
nosql>use myown
Switched to schema MYOWN
nosql>dbs
SCHEMAS: [MYOWN,TEST]

Note that the namespace (schema name) is case-insensitive.

To verify your current DB2 database connection, type db in the nosql-commandline processor, as shown in Listing 6.

Listing 6. Show current database connection
nosql>db
Database: jdbc:db2://localhost:50000/myjsondb Schema: TEST

Prefix commands with db to work with the selected namespace or collections in the namespace. The db prefix is a convenience shortcut and will be mapped to build fully qualified names when executing a command.


Working with collections and documents

Store and query documents with implicitly created collections

With the DB2 JSON feature, JSON documents are organized in collections. JSON collections do not enforce a collection schema or document structure, however, usually documents in a collection share common characteristics to make it easier to search for (and find) data.

Contrary to relational tables, it is therefore not necessary to define a table structure for collections. To insert a document, it is sufficient to specify the name of the collection, and if this collection does not already exist, it will be automatically created. In such a case, if the document contains an identifier, that is, a field tagged with the attribute name _id, that field is used as unique identifier and it is expected that all new documents will contain an _id of the same datatype.

If no such attribute is present for the initial document, the system uses generated object identifiers to uniquely identify each document. If documents that are inserted later contain an _id field, the datatype must fit into a VARCHAR(12) FOR BIT field, otherwise the document is rejected.

To insert a document, use the command db.<collectionName>.insert(<document>), as shown in Listing 7.

Listing 7. Insert a document (formatted for readability)
nosql>db.books.insert({
          isbn: "123-456-789", 
          author: "Verne, Jules", 
          title: "Journey to the Center of the Earth", 
          abstract: "Classic science fiction novel in an unusual setting", 
          price: 6.00, 
          pages: 276, 
          category: "Fantasy",
          sales: 500.50
         })

The previous sample document contains three different data types, string (such as Verne, Jules), number (such as 6.00), and integer (such as 276). Since it does not contain a field tagged with the attribute name _id, the system uses a generated object identifier to uniquely identify the document.

Table 1 contains a list of commonly occurring data types. See the DB2 JSON reference documentation for references containing a complete list.

Table 1. Commonly occurring data types
DatatypeIndex allowedCommentExample
$stringYString
$int, $integer, $longYinteger, long$int: 123
$numberYdouble, float$number: 123.45
$dateYmust be in format: 'yyyy-mm-ttThh:mm:ssZ $date: '2013-05-18T18:56:00Z'
$timestampYtimestamp
$binaryYbyte array
$oidYobject identifier (binary)

You can query the data with the find() command, as shown in Listing 8. For the example document, a collection named books has been created with an automatically generated binary object identifier. Note that the collection name is case-sensitive.

Listing 8. List all documents
nosql>db.books.find()
nosql>Row 1:
nosql> {
nosql> "_id":{"$oid":"519b8727cd1552ed65b47a20"},
nosql> "isbn":"123-456-789",
nosql> "author":"Verne, Jules",
nosql> "title":"Journey to the Center of the Earth",
nosql> "abstract":"Classic science fiction novel in an unusual setting",
nosql> "price":6,
nosql> "pages":276,
nosql> "category":"Fantasy",
nosql> "sales": 500.5
nosql> }

For new documents, the datatype of the _id must conform to the _id datatype of the collection, in this example the binary datatype. An attempt to insert a new document with an explicit _id that does not match the expected datatype will result in an error, as shown in Listing 9.

Listing 9. Error case: Insert a document with wrong _id data type (formatted)
nosql>db.books.insert({
   _id: "123-456-788", 
   author: "Verne, Jules", 
   title: "Journey to the Center of the Earth", 
   abstract: "Classic science fiction novel in an unusual setting", 
   price: 6.00, 
   pages: 276,
   category: "Fantasy",
   sales: 500.50
  })

nosql> Error:[nosql][1.0.146] DBException;
 Caused by: [jcc][1083][10403][3.66.33] Illegal conversion: can not convert from
 "java.lang.String" to "byte[]" ERRORCODE=-4474, SQLSTATE=null

Create collections explicitly

In the previous example, the collection was created automatically with the first document. However, automatically created collections use default settings, and it is therefore often desirable to create collections explicitly with custom settings, as shown in Listing 10.

Listing 10. Create a collection named 'media' with _id's of type long
nosql>db.createCollection(“media”, {_id: “$long”)})
Collection: TEST."media" created. Use db.media.

Note that only the document _id, which is used as unique identifier, is specified in the definition of the collection. JSON documents are stored schema-less and a document schema definition is therefore not necessary.

Custom settings can also be used to enable DB2 features (like compression and time travel), to assign a tablespace and control bufferpool usage.

Note that bi-temporal capabilities and time travel queries are not supported for JSON collections in DB2 for z/OS. Some restrictions also apply to collection storage characteristics, for example, the TABLESPACE option is not currently supported in the createCollection command with DB2 for z/OS.

See the Resources section for a link to the DB2 Information Center to learn more about tablespace and bufferpool management, compression, and time travel features. To learn more about DB2 JSON applications, see the other articles in this series.

Import documents

Apart from the insert command for single documents, it is also possible to import data from a file. Download the sample file from Downloads to execute the following steps, as shown in Listing 11.

Listing 11. Import documents from a file
nosql>db.books.importFile(“books_import.js”)
14 objects were imported

Tip: If the command returns a 'file is not found' error, use the absolute file location.

The import command also allows you to set a commit frequency, as shown in Listing 12. Larger batches may improve throughput.

Listing 12. Import documents in batches
nosql>db.books.importFile(“books_import.js”, 100)

The imported file must contain valid documents in JSON notation. If the file contains invalid documents, an error will be reported for each such malformed document.

JSON collections do not enforce a collection schema or document structure. However, applications typically expect some structural commonality of the documents. That is, JSON collections are not totally schema-less, instead they provide document schema flexibility. To get information about the possible structure of a collection, use the sampleSchema() command. This command analyzes a subset of documents and returns a list of attributes and the count of their occurrences, as shown in Listing 13.

Listing 13. Find information about the document structure
nosql>db.books.sampleSchema()

  {
  "._id":"15;type:ObjectId",
  ".abstract":"15;type:String",
  ".author":"15;type:String",
  ".category":"12;type:String",
  ".isbn":"15;type:String",
  ".pages":"14;type:Integer",
  ".price":"14;type:Double",
  ".sales":"13;type:Double",
  ".title":"15;type:String",
  ".year_published":"5;type:Integer"
  }

In this example, most attributes are present in all documents, the exception are the 'category', 'sales', 'year_published', 'pages' and 'price' attributes.

Query JSON documents

The find command db.<collection>.find(<conditions>, <projection list>) allows the selection of interesting documents by specifying conditions (similar to a SQL where clause), as shown in Listing 14, and enables tailoring the resulting output (similar to a SQL select statement).

nosql>db.books.find({author: "Tolkien, J.R"})

The example above returns all attributes for books by J.R. Tolkien, including the automatically generated identifier. To select which attributes of the documents should be returned, the query can be submitted with a projection list that indicates which attributes to include or exclude from the results. Use the following:

  • 1 – to include the attribute
  • 0 – to exclude the attribute

Tip: You can use either inclusion or exclusion, not both, in the same projection - with one exception: with a list of included fields, you can still exclude the _id, as shown in Listing 15.

Listing 15. Select specific document attributes
Search for titles and prices of the documents with this author:
 
nosql>db.books.find({author: "Tolkien, J.R"}, {title: 1, price:1})


Search for titles and prices of the documents with this author in this category:

nosql>db.books.find({author: "Tolkien, J.R", category: “Fantasy”}, {title: 1, price:1})


Search for titles and prices of the documents with this author in this category, 
but exclude the _id:

nosql>db.books.find({author: "Tolkien, J.R", category: “Fantasy”}, 
{_id: 0, title: 1, price:1})

Note that the _id is automatically included, except if it is explicitly excluded.

You can use comparison operators and join operators to select documents, as shown in Listing 16.

Listing 16. Use comparison values to search for documents
Search author, sales and price for books with sales less than 300, exclude the _id:

nosql>db.books.find({sales: {$lt:300}},{_id:0, author:1, sales:1, price:1})

The most common operators are listed in Table 2.

Table 2. Comparison and join operators
OperatorUsageExample
$eqequalsauthor: { $eq : "Lindgren, Astrid" };
equals as default notationauthor: "Lindgren, Astrid"
$lteless or equalpages: { $lte : 200 }
$ltless thanprice: { $lt : 6.50 }
$gtegreater or equalauthor.lastname: { $gte : "Doe" }
$gtgreater thanrating: { $gt : 3 }
$nenot equal status: { $ne : 1 }
$inmember of a set of valuesyear_published: { $in : [1990, 1991,1992] }
$ninnot a member of a set of valuesyear_published: { $nin : [2012, 2013] }
$andboth must be true$and:[{"rating":5},{"category":"Mystery"}]
$and is implied by using a comma as separator"category":"Mystery","rating":5
$orat least one must be true$or:[{"rating":5},{"category":"Mystery"}]
$norneither must be true$nor:[{"rating":3},{"category":"Mystery"}]
$notlogical not (do not match)$not:{"rating":1}

Control options for result sets

It is often useful to sort the output, for example, to list highest or lowest values first, or to order documents so values can be found more easily. Attributes in the JSON query can be sorted by listing one or more attributes in the sort() function, and specifying for each whether it should be sorted ascending (=1) or descending (= -1), as shown in Listing 17.

Listing 17. Search books and sort the result
Search books with sales less than 300, lowest price first:

nosql>db.books.find({sales: {$lt:300}},
      {_id:0, author:1, sales:1, price:1}).sort({price: 1})


Search books with sales less than 300, 
sort on lowest price and largest number of pages within:

nosql>db.books.find({sales: {$lt:300}},
              {_id:0, author:1, pages:1, price:1}).sort({price: 1, pages: -1})

If many documents match the criteria, depending on the application needs it may be sufficient to retrieve a subset of potential results. To specify the maximum number of documents that should be returned, use the limit() function with the query, as shown in Listing 18.

Listing 18. Limit the number of results
nosql>db.books.find({sales: {$lt:300}},{_id:0, author:1, pages:1, sales:1}).limit(3)

Limiting the result set can also be used to page through a larger set of results by adding an offset to the query with the function skip(), as shown in Listing 19.

Listing 19. Page through results
nosql>db.books.find({sales: {$lt:300}},
     {_id:0, author:1, pages:1, sales:1}).limit(3).skip(3)

If only a single result is needed, the findOne() function can be used for convenience, as shown in Listing 20.

Listing 20. Find first matching document
nosql>db.books.findOne({author: "Tolkien, J.R"}, {title: 1, price:1})

Aggregate data

With large numbers of documents, it is often desirable to aggregate results for a group of documents. For this purpose, a variety of functions can be used to get information about the number of documents, or distinct values or computed aggregate values for groups of documents.

To retrieve the number of documents that match specified query criteria, use the count() function, as shown in Listing 21.

Listing 21. Count documents
Get the total count of documents in this collection:
 
nosql>db.books.count()
15

Get the number of documents for this author:

nosql>>db.books.count({author: "Tolkien, J.R"})
2

To find distinct values for an attribute in a set of data, use the distinct() function for the attribute, as shown in Listing 22.

Listing 22. Find distinct values
nosql>db.books.distinct("author")

To apply a query, or use multiple key values, see the next article in this series: Using the DB2 JSON Java API.

The count() and distinct() functions are convenience functions for a specific purpose and can be used instead of the more general group() function. For example, distinct values can also be found with the group() function on an attribute, as shown in Listing 23.

Listing 23. Find distinct values with the group() command
nosql>db.books.group({"_id": {"author": 1}})

The group() function also allows the specification of a compound grouping key, and the definition of aggregation functions, for example, to calculate sums or averages, as shown in Listing 24.

Listing 24. Calculate the average price of books per author
nosql>db.books.group({"_id": {"author": 1}, "avgprice": {"$avg": "$price"}} )

For more complex queries, the aggregate() function can be used. It allows you to describe a query in a sequence of steps, such that intermediate results can be used again as input for later steps. For example, calculate the average price per author or use the following query to first select all documents with category "Fantasy", then pick attributes author and price from the documents, and group on authors to calculate the average price, as shown in Listing 25.

Listing 25. Calculate the average price of books per author in a category (formatted)
nosql>db.books.aggregate(
      {$match:   { category: "Fantasy" }}, 
      {$project: {author:1, price:1}},
      {$group:   {_id: {author:1}, avgPrice:{$avg:"$price"}}} 
   )

nosql>Row 1:
nosql> {
nosql> "_id":"Tolkien, J.R",
nosql> "avgPrice":15.5
nosql> }
nosql>Row 2:
nosql> {
nosql> "_id":"Verne, Jules",
nosql> "avgPrice":6.35
nosql> }
nosql>2 rows returned

You can apply arithmetic, string, or date-time operations in the $project task. The following example uses the $divide operator to calculate the price per page after calculating sums and averages in a $group task for each author, as shown in Listing 26.

Listing 26. List authors and price per page sorted ascending (formatted)
nosql>db.books.aggregate(
    {$group: {"_id": {author:1},  "sumSales":{"$sum": "$sales"}, 
                avgPages: {"$avg": "$pages"},  
                avgPrice: {"$avg": "$price"} }},
    {$project: {author: "$_id.author", sumSales: 1, 
                pricePerPage:  {$divide: ["$avgPages", "$avgPrice"]}}},
    {$sort: {pricePerPage: 1}}
  )

Tip: When referring to actual or calculated attribute names in functions, make sure to prefix such attribute names with a $ and surround it with double quotes, like in "$avgPrice" for the $divide operation.

The aggregation function can contain single or multiple tasks and it is not required to include a $group task. For example, you can use aggregate() for simple projection tasks with or without modifying values, as shown in Listing 27.

Listing 27. Select a substring of an attribute
nosql>db.books.aggregate({"$project": {author:1, title:1, 
                 shortabs: {"$substr": ["$abstract",0,10]} } })

The aggregation tasks only allow fields to be specified by inclusion. If attributes are explicitly selected in a step via a projection or grouping step, they can be used in the next step with the reference name assigned in the previous step. All other fields are discarded.

Tip: The aggregate() function only returns the specified attributes. Different from the find() function, it will not automatically include the generated _id in a $project task.

Tip: Make sure that calculated or renamed fields are projected before referencing them in a later task.

Valid tasks are shown in Table 3.

Table 3. Aggregation tasks
TaskDescription
$group apply aggregate operations like sum or avg; requires a group key built from one or more attributes
$limit limit the number of documents that should be returned
$match use query criteria to select a subset of documents
$project select fields from a document and (optionally) assign reference names
$skip use an offset for paging
$sort sort the output (ascending 1, descending -1) for the given attributes

Table 4 shows a list of supported aggregation operations. See the DB2 JSON reference documentation for detailed usage.

Table 4. Aggregation operations
Datatype(s)Operation
$string$concat, $max, $min, $substr, $toLower, $toUpper
$date, $timestamp$year, $month, $week, $hour, $minute, $second, $dayOfYear, $dayOfMonth, $dayOfWeek
$int, $number$add, $divide, $multiply, $subtract, $mod, $min, $max, $avg, $sum, $inc

Tip: If you receive an error that a specific operator is not supported, review the syntax and ensure that the sequence is {operator: {field: value}}, for example, {“$gt”: {rating: 1}}.

Update documents

The update() function updates data in one or more documents. It supports several optional arguments to determine the scope of the update. The usage is update(<condition>, <fields to update>, <upsertFlag>, <multiRowFlag>).

Caution: When specifying new field values, make sure to use the $set operator if you only want to update the specified fields in a document and leave other fields unchanged. Without this operator, the document content is replaced by the specified fields, as shown in Listings 28 and 29.

Listing 28. Update a document with the specified isbn using the $set operator
nosql>db.books.find({isbn: "123-456-234"}, {_id:0, isbn:1, author:1, title:1, pages:1})
Row 1:
{
 "isbn":"123-456-234",
 "author": "Verne, Jules",
 "title":"Journey to the Center of the Earth",
 "pages":276
 }

nosql>db.books.update({isbn: "123-456-234"}, {$set: {pages: 299}})
Updated 1 rows.

nosql>db.books.find({isbn: "123-456-234"}, {_id:0, isbn:1, author:1, title:1, pages:1})
Row 1:
{
 "isbn":"123-456-234",
 "author": "Verne, Jules",
 "title":"Journey to the Center of the Earth",
 "pages":299
 }
Listing 29. Update a document without the $set operator
db.books.update({isbn: "123-456-234"}, {isbn: "123-456-234" , price: 6.50})
Updated 1 rows.

nosql>db.books.find({isbn: "123-456-234"}, {_id:0, isbn:1, author:1, title:1, price:1})

 Row 1: {
 "_id":{"$oid":"519e2bfb37a817b9cba77ab0"},
 "isbn":"123-456-234",
 "author": null,
 "title": null,
 "price":6.50
 }

Further options to determine the scope of updates are the upsert and the multiRow flags. The upsert flag allows specifying whether missing documents should be inserted or rejected, that is, the flag indicates what the system should do if there is no match in the collection for a specified identifier:

  • If the upsert flag is set to true, the missing document will be inserted.
  • If the flag is set to false, the update will be rejected.

With the multi-row flag, you can also determine whether the update should only be applied to the first document that matches the search criteria, or all such matching documents, as shown in Listing 30.

Listing 30. Examples for options to update documents
Do not insert if no matching document exists, 
update the first document that matches the query:

nosql>db.books.update({author: "Climber, Joe"}, {$set: {price: 9.99}}, false, false)


Do not insert if no matching document exists, update all documents that match the query:

nosql>db.books.update({author: "Climber, Joe"}, {$set: {price: 9.99}}, false, true)

The save() method combines insert and update capabilities, but relies on the document identifier to determine the scope. If the document contains an _id, the document will be upserted. If the document does not contain an _id, a new identifier will be generated and the document inserted.

In the following example, the book document does not contain an _id, so a new identifier will be generated and the new book will be inserted into the books collection, as shown in Listing 31.

Listing 31. Save a document
nosql>db.books.save({isbn: "123-456-239", 
              "author": "Verne, Jules",  "title": "Mysterious Island" })"

Work with indexes

If an attribute is often used as a selection or sort criteria in queries, it is useful to create an index to speed up retrieval times for larger data workloads. Indexes on JSON attributes can be created on single or multiple attributes. If no index name is specified, a default name will be assigned.

Note: For DB2 on Linux, Unix and Windows, indexes can be sorted in ascending and descending order, and indexing an element in an array is supported. With DB2 for z/OS, indexes can only be in ascending order and indexes on arrays are not supported.

To create an index, use the ensureIndex() command, as shown in Listing 32.

Listing 32. Create indexes
Create an index on field 'author' in ascending order, 
     using the default type string with default length 50:

nosql>db.books.ensureIndex({"author": 1})
Index <books_xauthor> was created successfully.


Create an index on field 'category' with type string and field length 40:

nosql>db.books.ensureIndex({"category": [1, "$string", 40]})
Index <books_xcategory> was created successfully.

 
Create an index on field price with type number in descending order, name it 'mypriceidx':

nosql>db.books.ensureIndex({"price": [-1, "$number"]}, “mypriceidx”)
Index <mypriceidx> was created successfully.

You can view a list of indexes for the collection with the getIndexes() command. Note that a unique index is automatically created on the _id attribute, as shown in Listing 33.

Listing 33. Index information including index on identifier (output formatted)
nosql>db.books.getIndexes()

[{"v":0,"key":{"_id_":1},"ns":"TEST.books","name":"_id_","unique":true}, 
{"v":1,"key":{"author":1},"ns":"TEST.books","name":"books_xauthor","unique":false},
{"v":2,"key":{"category":1},"ns":"TEST.books","name":"books_xcategory","unique":false}, 
{"v":3,"key":{"price":1},"ns":"TEST.books","name":"mypriceidx","unique":false}]

Use the unique flag to create a unique index on an attribute. The system will attempt to create a unique index. If the collection already contains documents, it must be ensured that there is no duplicate data in the specified fields, as shown in Listing 34.

Listing 34. Error case: Create unique index on duplicate data
nosql>db.books.ensureIndex({"isbn": [1, "$string", 30]}, "myisbnix", true)

nosql> Error:[nosql][1.0.66] Failed to create index. ;
 Caused by: A unique index cannot be created because the table contains data that would 
 result in duplicate index entries.. SQLCODE=-603, SQLSTATE=23515, DRIVER=3.66.33

To find duplicate entries, an aggregation query comes in handy, as shown in Listing 35.

Listing 35. Example to find duplicate data (formatted)
nosql>db.books.aggregate({"$group": {_id: "isbn", countdocs: {"$sum": 1}}}, 
    {"$project": {"grouped_isbn": "_id", "countdocs": 1}}, 
    {"$match": {"countdocs": {$gt: 1}}})

Nested objects

Documents may also contain nested objects, for example, instead of a single string that contains the author information, the document might separate between first name and last name, as shown in Listing 36. See also books_importNested.js in the Download section.

Listing 36. Document with nested object
{
  isbn: "123-456-234", 
  author: 
      {
         lastname: "Verne", 
         firstname: "Jules"
      }, 
  title: "Journey to the Center of the Earth", 
  sales: 333.0
  })

Each element in the document must be uniquely identifiable. For example, the element "title" can only occur once, as does the element "author.lastname", as shown in Listing 37. However, an attribute name may be the same if it is located in a different path. It is therefore valid to have another attribute named "lastname" as long as it has a separate absolute path in the document, for example, "editor.lastname".

Listing 37. Import and search documents with nested attributes
nosql>db.booksnest.importFile(“books_importNested.js”)

nosql>db.booksnest.findOne()
{
 "_id":{"$oid":"51b7a63d3503e8eca2e84556"},
 "isbn":"122-456-789",
 "author":{"lastname":"Tolkien","firstname":"J.R"},
 "title":"The Hobbit",
 "abstract":"Spiders and Dragons",
 "category:["Fantasy", "Fiction"],
 "price":5.0,
 "pages":216,
 "sales":800.8
 }

To query for a nested object, use a dot-notation to specify the full path for the desired attribute. For example, to search on the author's last name, use "author.lastname", as shown in Listing 38.

Listing 38. Query on a nested object
nosql>db.booksnest.find({"author.lastname": "Verne"})

Reminder: Make sure to enclose field names in double-quotes.

Arrays

Documents may also contain multiple occurrences of an object in the form of an array. For example, a user may have multiple phone numbers, or email addresses, or a book may belong to multiple categories or can have more than one author, as shown in Listing 39.

Listing 39. Use of an array to list multiple categories (formatted)
nosql>db.booksnest.insert({isbn: "876-543-321", 
           author: {lastname: "Doe", firstname: "John" },
           title: "A new book", 
           abstract: "The most curious items", 
           pages: 111,
           category: ["Humor", "YA"]
         })

When you search on the attribute without specifying an array position, the value will be searched for in all positions, as shown in Listing 40.

Listing 40. Search value in an array
nosql>db.booksnest.find({category: "Humor"})

nosql>Row 1:
nosql> {
nosql> "_id":{"$oid":"51a293e437a81d6b7f8c1971"},
nosql> "isbn":"876-543-321",
nosql> "author{lastname: "Doe", firstname: "John"},
nosql> "title":"A new book",
nosql> "abstract":"The most curious items",
nosql> "pages":111,
nosql> "category":["Humor", "YA"]
nosql> }

However, you can also search on a specific position by adding the array position to the attribute in the dot-notation, as shown in Listing 41.

Listing 41. Search an array value in a specific array position
nosql>db.booksnest.find({"category.0": "Humor"})
nosql>Row 1:
nosql> {
nosql> "_id":{"$oid":"51a293e437a81d6b7f8c1971"},
nosql> "isbn":"876-543-321",
nosql> "author{lastname: "Doe", firstname: "John"},
nosql> "title":"A new book",
nosql> "abstract":"The most curious items",
nosql> "pages":111,
nosql> "category":["Humor", "YA"]
nosql> }
nosql>1 row returned 

nosql>db.booksnest.find({"category.1": "Humor"})
nosql>0 rows returned

Tip: Note that the count for the first position starts with 0.

See the DB2 JSON reference documentation on working with arrays for the most recent details about restrictions for commands and functions on data in arrays.


Administration

Work with collections

To list all collection names in the current JSON namespace, use the getCollectionNames() command. In the current JSON namespace in this example, three collections are present, as shown in Listing 42.

Listing 42. Show current collections
nosql>db.getCollectionNames()
[media, books, booksnest]

To rename a collection, for example, to continue working with a collection but archive existing content, use the rename() command to specify a new name, as shown in Listing 43.

Listing 43. Rename an existing collection
nosql>db.books.rename(“oldbooks”)

If a collection with the new name already exists, the command will fail. However, you can force renaming the collection, by specifying a flag to force a drop of the existing target collection. In the example, if a collection with the name "oldbooks" already exists, the command db.books.rename(“oldbooks”, true) will first drop the collection "oldbooks" and then rename the collection "books" into "oldbooks".

Statistics

The stats() command returns information about a collection, in particular, the namespace, number of documents, and a list of indexes. The statistics also includes information about the average size of documents, and the total size of the collection, as shown in Listing 44. This information is populated from the DB2 statistics and may be refreshed with a delay. Unavailable values are set to -1.

Listing 44. Get collection statistics (output pruned)
nosql>db.books.stats()

  {
  "ns":"TEST.books",
  "count":15,
  "size":4.5263671875,
  "avgObjSize":309.0,
  "numExtents":-1,
  "nindexes":2,
  "totalIndexSizes":-1,
  "indexSizes":[{"books_xauthor": -1},{"_id_":-1}],
  "ok":1
  }

Remove data, indexes, and collections

To delete a document or multiple documents from the collection, use the remove() function. You can either remove all documents (index definitions will be kept), or remove a subset of documents by specifying conditions, as shown in Listing 45.

Listing 45. Remove a document
nosql>db.books.remove({isbn: "123-456-789"})

If an index is not necessary anymore, or you want to recreate an index with different index characteristics, you can remove the index by either referencing the index name or by specifying the index characteristics that were used to create the index (if no index name was explicitly assigned), as shown in Listing 46.

Listing 46. Remove indexes
nosql>db.books.removeIndex("mypriceidx")
Index <mypriceidx> was removed successfully.

nosql>db.books.removeIndex({"author": 1})
Index <books_xauthor> was removed successfully.

To remove all data from the collection but keep the empty collection and any indexes, use the remove() command, as shown in Listing 47.

Listing 47. Remove a collection
nosql>db.books.remove()
OK
nosql>db.books.getIndexes()
[{"v":0,"key":{"_id_":1},"ns":"TEST.books","name":"_id_","unique":true}, 
{"v":1,"key":{"pages":1},"ns":"TEST .books","name":"books_xpages","unique":false}, 
{"v":2,"key":{"isbn":1},"ns":"TEST .books","name":"myisbnix","unique":true}]

To drop the collection and all indexes, use the drop() command on the collection, as shown in Listing 48.

Listing 48. Drop a collection
nosql>db.books.drop()
OK
nosql>db.books.getIndexes()
[]

Cleanup

To remove all collections for a specific JSON namespace, the command dropAllCollections() can be used, as shown in Listing 49. If this JSON namespace contains collections for which the user does not have the necessary drop privilege, an error message will be printed for any such collection.

Listing 49. Dropping all collections in a JSON namespace
nosql>db.dropAllCollections()

For a force drop of collections and to remove the JSON system catalog, use the disable() command with the value true, as shown in Listing 50. As with the dropAllCollections() command, adequate privileges are required to drop all affected objects.

Listing 50. Disabling the JSON feature
nosql>disable(true)

If the disable() command is used with the option false , the DDL statement will be printed, but not executed.


Conclusion

This article has guided you through the setup of the DB2 JSON command line environment, and introduced some basic commands to work with collections to store and query JSON documents. To find out more about DB2 JSON, see how to work with the Java API, or how to use the wire listener to receive and handle requests, please see the other articles in this series. For details about commands and operations, see the DB2 JSON reference documentation.


Downloads

DescriptionNameSize
Sample JSON documentsbooks_import.zip2KB
Sample JSON documents with nestingbooks_importNested.zip2KB

Resources

Learn

Get products and technologies

  • Download a trial version of DB2 for Linux, UNIX, and Windows to evaluate the DB2 JSON support for yourself.
  • Build your next development project with IBM trial software, available for download directly from developerWorks.
  • Evaluate IBM products in the way that suits you best: Download a product trial, try a product online, use a product in a cloud environment, or spend a few hours in the SOA Sandbox learning how to implement Service Oriented Architecture efficiently.

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Information management on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management, Java technology, Open source
ArticleID=934233
ArticleTitle=DB2 JSON capabilities, Part 2: Using the command-line processor
publish-date=06202013