You’ve built your data catalog – now what?

By: Jay Limburn

How an integrated data science platform can transform your ability to make productive use of big data

  • Leverage an integrated data and analytics toolset that makes data science smooth and seamless

  • Unlock self-service big data analytics for users of all skill levels – from data scientists to citizen analysts

  • Get a new end-to-end data science workflow up and running within a day

Geoffrey Moore, author of “Crossing the Chasm,” once wrote: “Without big data analytics, companies are blind and deaf, wandering out onto the web like deer on a freeway.” In the 21st century, data-driven decision-making is vital if businesses are to succeed, but simply gathering and storing masses of data isn’t enough. You need to be sensitive to what the data is telling you, and build up enough situational awareness to find the right path to your objectives.

We’ve already discussed in a previous post how companies can find the data they need easily with IBM Data Catalog, which provides smart, automated data discovery, curation, and governance. But what’s the next step once you’ve found the data?

Turning data into insight

Imagine you’re looking for a book in a library. There’s a wonderful index that directs you straight to the right shelf, and based on the title, it seems like you’ve found what you were looking for straight away. But when you try to check out the book, you’re forced to jump through a complicated series of hoops in a bureaucratic process that takes hours. And when you finally get the book home and thumb through the pages, you realize that many of its chapters aren’t even relevant to the topics you wanted to read about.

It’s the same with data. Many data cataloging tools are just catalogs: they can categorize data sets, control access to sensitive assets, and make data easy to find, but they don’t really help you take the next steps. If you want to profile, prepare, explore and analyze a data set, and then deploy those insights into an application, you typically need to export the data from your catalog into other tools. This can be a time-consuming and scattershot process. Even worse, you might go through the whole rigmarole, only to find that you still don’t have the data that you need.

An integrated toolset

IBM Data Catalog is different. Not only does it enable users to find data quickly and easily – it also empowers you to make productive use of your own and third-party data, by creating collaborative projects within a single self-service environment. Thanks to its tight integration with other tools within IBM Watson Data Platform, Data Catalog allows you to move data between workspaces in a couple of clicks, and get started straight away on the next steps in your data science workflow.

For example, you can quickly check and clean your data set in IBM Data Refinery, an intuitive data preparation environment that lets you view, profile and reshape your data before you start analyzing it in detail. This can dramatically shorten the analytics cycle, because you can see within minutes whether the data set you’ve found in the catalog gives you the information you need.

Next, within the same interface, you can transfer the data directly into other integrated workspaces such as IBM Data Science Experience and IBM Watson Machine Learning. Data Science Experience provides access to familiar open-source data science tools, augmented with leading-edge IBM technologies: for example, users can analyze and visualize data in Meanwhile, Watson Machine Learning empowers users to build machine learning models and neural networks, train them efficiently, and evaluate the results.

With this seamless toolset at your fingertips, data professionals across your organization can quickly analyze and glean insights from data, and build data-driven applications that enable fast, informed decision-making.

Quick and easy

The simplicity of these point-and-click, self-service tools means that anyone who works with data can use them – from highly skilled data scientists and business analysts who are digging deep into data, to the “citizen analyst” marketing manager who needs to turn sales data into a report to guide their campaign strategy.

In addition, there’s no cumbersome implementation or configuration involved in setting up these solutions – unlike some big data infrastructure projects, which can take months or even years. Instead, you can simply sign up for Watson Data Platform as a cloud service, connect some of your back-end databases to Data Catalog, and get your users up and running within a day.

Thanks to Data Catalog, companies have the tools to both find and harness their data effectively, sharpening their senses and giving them the situational awareness they need to succeed in a digital age.

