Community

Graph databases – a natural way to represent data

Share this post:

Among the types of NoSQL databases, graph databases are increasingly popular because of their unique approach to data storage and retrieval. Instead of storing data by itself and requiring ad hoc queries to find relationships — as with traditional relational databases — a graph database stores data with its relationships. In the simplest terms, a graph database is a database management system where relationships between the data points are of highest priority.

Alaa Mahmoud (@alaa_mahmoud), Master Inventor at IBM, is a lead developer of IBM’s graph database-as-a-service, IBM Graph, built using Apache TinkerPop and now available in a free beta trial. In Episode 12 of the New Builders podcast, “Of Graphs and Gremlins – Graph Database 101,” Mahmoud discusses the process behind building IBM Graph, the industry’s first enterprise-grade distributed property graph offered as a fully managed cloud service.

[soundcloud url=”https://api.soundcloud.com/tracks/272645243″ params=”color=ff5500&auto_play=false&hide_related=false&show_comments=true&show_user=true&show_reposts=false” width=”100%” height=”166″ iframe=”true” /]

“Graph databases are special in the way that they store information,” he explains. “When you look at NoSQL, even traditional RDBMS systems, they store data as tables, documents, columns, rows. If you think about it, it’s not really the natural way of representing data.”

Mahmoud further explains that when you think of your relationships to the many people you have in your life, those relationships aren’t represented by a table. “You think of them as a graph, as people having relationships with others,” he says. In a graph database, data is stored in nodes or vertices that connect to each other using relationships called edges.

Graph is a natural way of representing data… ‘Give me all the nodes with this property and these relationships’ is what you’re after.Alaa Mahmoud, IBM

“Graph is a natural way of representing data,” says Mahmoud. “Instead of making selects and joins and this kind of unnatural way of looking for stuff, what you’re really looking for is ‘Give me all the nodes that have this particular property that have relationships with other nodes.’ So you’re naturally querying the data, and you’re naturally storing it.”


Graph databases in the real world

Graph databases are most commonly used to power recommendation engines (the kind used by e-commerce sites and streaming services, for example), a perfect use case for a database that sees the natural relationships between people and things.

Furthermore, as Mahmoud points out, graph databases are relevant to many industries’ use cases, including data modeling, pattern recognition, and network topology: “Network topology is one of the very famous, very common uses of graph databases,” he says. “I have printers and I have machines; I have laptops and I have servers in different data centers — and they connect, and they have a network that connects all these pieces together, and IoT-coupled devices. It’s a graph of different devices that are connected that I want to model: [it’s] network modeling.”

Because graph databases can detect patterns in real-time, they are also ideal for use in fraud detection. In a recent blog post, Larry Weber, IBM Analytics Program Director, detailed how graph databases can more effectively detect fraud as it is happening. For example, traditional databases can detect fraud if a charge is unusually large or falls outside the buyer’s normal habits. But thieves have adjusted their patterns to make several small purchases that go unnoticed by traditional fraud detection systems.

“In a graph database, however, data and connections are stored together,” writes Weber. “Accordingly, such databases store not only data points but also data relationships and properties, allowing transactional applications to be imbued with real-time analytics functions…. A graph database is adept at detecting exactly this sort of nuanced transaction.”

The rapid rise of graph databases

Though relatively new, graph databases continue to gain traction with developers. Ben Kepes writes in Forbes that “much of this demand is driven by developers’ demand for a database that isn’t simply about data storage and retrieval. Being able to derive business value from related data, within the context of the database itself, means organizations can more rapidly react to customer actions.” Kepes also notes that Forrester predicted graph databases would be used by 25 percent of enterprises by 2017.

As the inventor of Neo4j, the first operational property graph database, Emil Eifrem attributes the rise of these databases to three factors. Graph databases are able to:

  • Analyze the world around us
  • Model and allow for significantly faster development than other databases options
  • Get real-time information for analysis

Graph databases are faster and provide better analytics than traditional databases, and as such, are becoming widely adopted in the enterprise. Mahmoud feels so strongly about the power and potential of graph databases that he was hard-pressed to offer an example where they might not be the appropriate database choice. Graph database systems are, as he says, the “natural way of presenting data.”

If you are simply looking to get back basic information from a query, like an employee number — if you’re never going to need analytics — then maybe an RDBMS would be a better fit. But, Mahmoud warns, “If your data doesn’t have relationships, your data is pretty much useless, and you want to think more about what you have and what you’re really getting out of the data that you’re storing.”

Ready to get started?

Begin today with the free beta trial of IBM Graph on Bluemix:

Listen to more episodes of the New Builders Podcast:

More Community stories
October 23, 2018

Leverage the Benefits of the Changes to IBM Analytics Engine Clusters

In view of these value-driven changes to IBM Analytics clusters, we encourage you to create new clusters, especially if you are still using clusters created before August 17, 2018, as these clusters do not include any of the latest updates. Before you delete your old clusters, you should back up your data, metadata, and any changes done on the clusters so that no data is lost.

Continue reading

October 19, 2018

Part 1: Build Messaging Solutions with Apache Kafka or Event Streams for IBM Cloud

As part of the iterative approach described in the main introduction blog of this series, the first step is to building messaging solutions is to identify the use case requirements and quantify these requirements as much as possible in terms of Apache Kafka and Event Streams.

Continue reading

October 17, 2018

Introduction: Build Messaging Solutions with Apache Kafka or Event Streams for IBM Cloud

This multi-part blog series is going to walk you through some of the key architectural considerations and steps for building messaging solutions with Apache Kafka or IBM Event Streams for IBM Cloud. This series will be helpful for developers, architects, and technology consultants who have a general understanding of Apache Kafka and are now looking toward getting deeper into evaluating and building messaging solutions.

Continue reading