Column Family Farm Challenge

Column family stores share a similar architectural structure with key-value stores, but are otherwise quite different from other non-relational databases.

"Column family" refers to the sets of columns that are the units of access control in this type of data store. Column family stores use hash maps (essentially a list of key-value pairs) and are organized into cells in corresponding columns. A record is a grouping of these columns. It's possible for columns to be "sparse," which means there is no schema forcing each record to have a corresponding entry in every column.

Column family stores enable:

  • High scalability and high availability
  • No single point of failure
  • Flexible schemas
  • Tunable consistency

Ready to Start Your Challenge?

If you haven't yet visited Document Store Towers or the Relational Village, go there now

If you are familiar with those realms, it's time to begin your challenge.

Document Store Towers Challenge

A document store does not have much to do with “documents” in the usual sense – it does not refer to a letter, book or article. In this case, a document refers to a data record that is self-describing in regards to the data elements it contains.

At its core, a document database can be considered a key-value store with one major exception: Instead of persisting opaque values, a document database requires the data to be stored in a format that the database can understand (i.e. JSON, XML, etc.).

Document databases address are known for their simplicity and scalability, as well as fast iteration in development. Say you need to add a new field to your object to run a new feature in your app. With a doc store, there’s no schema to update – just add the new field and go. No messing with ORMs or worrying about breaking someone else's feature.

Document stores enable

  • High availability and scalability
  • Schema flexibility
  • Sharding/partitioning
  • Fast write times

Ready to Start Your Challenge?

If you haven't yet visited Column Family Farm or the Relational Village, go there now

If you are familiar with those realms, it's time to begin your challenge.

Graph Islands

The modern graph database is a data storage and processing engine that makes the persistence and exploration of data and relationships more efficient. In graph theory, structures are composed of vertices and edges (data and connections), or “data relationships.” Graphs behave similar to how people think – in specific relationships between discrete units of data.

When you query a graph database, you get all the nodes (data points) that have a particular property and are related to other nodes. Both nodes and edges can store additional properties such as key-value pairs. Graph databases are commonly used in recommendation engines and fraud detection, and are also useful for data modeling, pattern recognition, and network topology.

Graph databases enable:

  • Working with complex data relationships and dynamic schemas
  • High app scalability with low impact to query performance or data model
  • Increasing the speed and accuracy of analytics systems
  • Pattern recognition, e.g. in fraud detection and recommendation engines

Explore graph databases for yourself; check out Apache TinkerPop™

To learn more about navigating NoSQL, read the free eBook, A Field Guide to the World of Modern Data Stores.

Key-Value Mountain

Key-value stores represent the most basic type of non-relational database, where each item in the database is stored as an attribute name – or key – together with its value.

In the key-value structure, the key is usually a simple string of characters, and the value is a series of uninterrupted bytes that are opaque to the database. The data itself is usually some primitive data type (string, integer, array) or a more complex object that an application needs to persist and access directly.

The key-value data store’s model of storing schema-less data – in contrast with the rigidity of relational schemas – enables developers to easily modify fields and object structures as their applications evolve.

In general, key-value stores have no query language. They simply provide a way to store, retrieve and update data using simple get, put and delete commands. The path to retrieve data is a direct request to the object, whether it is in memory or on disk.

Key-Value Stores enable:

  • Performance and rapid scalability
  • Constant stream of small reads and writes
  • Simplicity of data access patterns
  • High availability through relaxed consistency model

Explore key-value stores yourself by checking out Redis by IBM Compose, one of 7 open source technologies available on the IBM Compose production-ready, cloud hosted database platform.

To learn more about navigating NoSQL, read the free eBook, A Field Guide to the World of Modern Data Stores.

Relational Village Challenge

The rigid schemas in traditional relational databases lend themselves perfectly for reporting and analytics. In contrast, most NoSQL data stores powering today’s web and mobile apps lack a rigid schema, making it a challenge to get business metrics out of non-relational data sources.

As a result, businesses today have to think about bringing non-relational data from the myriad of SaaS systems they are using into a format that is useful for business intelligence. For example, non-relational JSON data can be stored in an operational data store (ODS) using Cloudant. Then, the JSON can be put into a schema for analysis using Cloudant’s schema discovery process (SDP) and integration with IBM dashDB, a relational cloud data warehouse.

Once JSON data has been given a schema and is stored in a relational data warehouse, it is ripe for visualization, analytics and exploration using SQL-compatible tools. This offers many opportunities for “embedded BI” – which means using metrics alongside the operational processes that a business application is recording.

Ready to Start Your Challenge?

If you haven't yet visited Document Store Towers or the Column Family Farm, go there now

If you are familiar with those realms, it's time to begin your challenge.

RabbitMQ Portal

RabbitMQ can be thought of as a nervous system for your applications, reliably passing messages between app components. RabbitMQ asynchronously handles the messages between applications and databases, allowing separation of the data and application layers.

RabbitMQ lets you route, track, and queue messages with customizable persistence levels, delivery settings, and publish confirmations. It runs on a server cluster ensuring high availability. Three member nodes function as a single logical broker for application connections, and RabbitMQ vhosts, exchanges and queues are mirrored across the nodes for consistency.

Explore RabbitMQ yourself by checking out RabbitMQ by IBM Compose, one of 7 open source technologies available on the IBM Compose production-ready, cloud hosted database platform.

To learn more about navigating NoSQL, read the free eBook, A Field Guide to the World of Modern Data Stores.

Datastorm

Today’s applications are often built using a series of different databases and languages to accomplish different functions. In this polyglot persistence model, an application stack environment comprises several database types integrated with one another, thereby capitalizing on the strengths of each. The type of data store the developer chooses depends on the data and its individual function in a particular application – it’s all about finding the right tool for the job.

It's often a good idea, especially with a legacy application, to keep the existing data store, especially if it is performing well, and integrate it with new technology to enhance functionality. An example would be integrating a graph database into an existing data layer – perhaps made up of a key-value store or a doc store – to build out a recommendation engine or add fraud detection to an app.

A polyglot persistence approach also allows an IT team to avoid the costly “rip and replace” route and keep information in the database that’s the best fit for however it will be used. In addition to opening up a new range of use cases, polyglot persistence enables a development team to avoid conversion of the data as it moves across different functions.

Explore multi-database strategies yourself by checking out the database toolkit from IBM Cloud Data Services.

To learn more about navigating NoSQL, read the free eBook, A Field Guide to the World of Modern Data Stores.

API Aqueduct

Application programming interfaces (APIs) are everywhere today, enabling developers to more easily synchronize services to work together efficiently. At the same time, it can be difficult for data architects and analysts to access the data behind these APIs.

The Simple Data Pipe is an open source project from IBM Cloud Data Services that makes it easier for a developer to connect to JSON data encapsulated behind multiple, disparate web APIs and land it all in a single staging ground, in its native form. There, you can analyze and explore the data with your tools of choice.

This app uses the IBM Cloudant JSON database as the operational data store, and it provides prebuilt API connections to many popular data sources. It also easily moves data to IBM dashDB - a columnar database — for data warehousing, or to IBM Analytics for Apache(R) Spark™ for advanced analytics processing.

Explore easy NoSQL data movement yourself by checking out the free Simple Data Pipe app from IBM Cloud Data Services.

To learn more about navigating NoSQL, read the free eBook, A Field Guide to the World of Modern Data Stores.

Faceted Forest

One of the most important aspects of a database is how quickly and intuitively a user can conduct a search for the specific information they seek, whether it’s contained in the rows and columns of a relational database, or it’s unstructured data in some other type of data store.

The Simple Search Service is a sample app created by IBM Cloud Data Services, which leverages both a document store and a key-value database to make it easy for developers to add a faceted, full-text search function for web or mobile apps. By deploying and uploading a data set to the IBM Cloudant JSON database, the developer gains access to a doc store with each file indexed for search.

The Simple Search Service is also scalable, enabling developers to make even more of their app searchable, by adding additional nodes and a centralized cache that uses Redis by IBM Compose.

Explore easy faceted search yourself by checking out the free Simple Search Service app from IBM Cloud Data Services.

To learn more about navigating NoSQL, read the free eBook, A Field Guide to the World of Modern Data Stores.

Wolfpack of Conflicted Data

The Consistency, Availability and Partition tolerance (CAP) theorem states that a distributed system can have only two of the three characteristics. In Cloudant, a JSON document database, consistency is sacrificed in favor of availability and partition tolerance. Specifically, Cloudant is an eventually-consistent database, which means that no locking occurs when data is written. This characteristic enables the system to offer best-in-class uptime and scalability.

In order to keep the service available at all times, Cloudant must allow the same document ID to be altered on different nodes in the database. To reconcile the data, Cloudant maintains a revision history for every document in the database. This is a timeline of changes to the document; not the document body itself, only a history of the revision tokens.

One of the consequences of eventual consistency is that documents might enter a conflicted state if the same version of a document is modified in different ways on two disconnected nodes. There are three ways conflicts can arise in your application:

  • A document is modified on a mobile app, via the phone, and on Cloudant itself, via a web dashboard, and the two copies are synced
  • The same document is modified in different geographic clusters while the inter-site connection is down
  • Changes to the same document are sent to two nodes at the same time

It is good practice, as an application developer, to deal with any conflicts that arise in your documents. The benefit is a reduction in data size, and an optimized performance.

Explore the ins and outs of Cloudant yourself by checking out the Cloudant Learning Center from IBM Cloud Data Services.

To learn more about navigating NoSQL, read the free eBook, A Field Guide to the World of Modern Data Stores.

Eternal Fires of Indexing

Indexing a database is critical to accelerating queries. Without indexing, every search would require a full, top-to-bottom examination of the data, instead of one that is limited to a data structure that stores the specific values being searched.

The problem with this process is that each index must be stored – taking up additional space that could add latency and impact performance of the application whenever a new index is created or an existing one is modified.

In the case of document stores, like Cloudant, there are multiple ways to reduce the likelihood of performance issues caused by indexing, including:

  • Minimizing the overall number of views/indexes in each database
  • Strictly defining one index per design document
  • Applying indexes to all (or most) of the documents in their corresponding databases
  • For databases with a very large number of documents, updating design documents one at a time

Explore the ins and outs of Cloudant yourself by checking out the Cloudant Learning Center from IBM Cloud Data Services.

To learn more about navigating NoSQL, read the free eBook, A Field Guide to the World of Modern Data Stores.

Swamp of Impossible Queries

For decades, relational databases – characterized by schemas for data in tables, organized by rows and columns – were basically the sole choice of database for developers. But starting in the mid-2000s, the NoSQL (“not only SQL”) movement changed everything.

NoSQL databases are designed to provide storage solutions for use-cases that weren’t well suited to a traditional RDBMS, where the form of the data an application is storing:

  • Is not highly relational
  • Does not require the consistency guarantees (e.g. ACID compliance) offered by a RDBMS
  • Has a flexible or fast-moving schema

A NoSQL database is unlikely to implement SQL for extracting subsets of data, and in most cases joins, transactions, stored procedures and triggers are not implemented. NoSQL is often chosen when a project needs to scale rapidly, and relaxed consistency rules with more natural data models allow NoSQL databases to easily scale out across many commodity servers.

Explore the world of modern cloud data stores for developers by checking out IBM Cloud Data Services, the most complete portfolio of managed services for data and analytics.

To learn more about navigating NoSQL, read the free eBook, A Field Guide to the World of Modern Data Stores.

Terms & Conditions