Community

Put your data to work faster with the right data preparation tools

Share this post:

Data is widely seen as the new source of competitive advantage, driving smarter decisions and helping enterprises out-think their rivals. But opportunities are often missed because it takes too long for business analysts, data scientists and application developers to get the data they need from multiple underlying systems, while going through cumbersome IT processes. IBM DataWorks is a data preparation and movement service that changes the game, empowering data professionals to securely access, enhance, visualize, and prepare data from multiple sources for analysis. Hernando Borda, Offering Manager for DataWorks at IBM Analytics, is playing an instrumental role in building out the capabilities of IBM DataWorks, a cloud-based data preparation service. Hernando’s background is in computer science and software engineering, and he is a named author on an IBM patent on multi-threaded processes. We interviewed Hernando to find out why he believes DataWorks is the ideal tool for a new strategic approach to data preparation.


Hernando, thanks for joining us. What are the key challenges that DataWorks aims to solve?

Well, everyone agrees that data analytics is a growing source of value and competitive differentiation for enterprises. What’s also clear is that it’s hard for data professionals—including business analysts, data scientists, and application developers—to quickly get their hands on the good-quality data they need. According to a Forrester study, these people are spending up to 80 percent of their time finding and refining data, which means they don’t have much time to actually do useful analysis with it.

A key challenge is that enterprise data is—rightly—subject to heavy governance and security. Data is arguably the most valuable asset for all businesses, so IT staff must absolutely safeguard it by controlling access.

This leads to the “ask/wait” cycle: the data professional submits a request for a particular set of data, waits for the response, finds that it’s not exactly what they were expecting, resubmits a refined response, and so on. It’s frustrating for them, and it’s also just as frustrating for IT staff, who must put aside more interesting and valuable work to try to interpret ad hoc requests. In addition to the administrative costs and inefficiencies, this iterative cycle of tactical requests introduces significant delays, making it difficult for the enterprise to seize new opportunities.

Analyzing data from multiple sources can yield richer insights, and therefore data scientists are also faced with the challenge of getting access to multiple and disparate sources of data including on premises and from the cloud. For example, they might want to understand the impact of weather conditions on sales. In this scenario, not only do they need to source the appropriate internal data by negotiating with IT, but also they need to combine it with data from weather.com to create a hybrid data set. Read the report “Don’t let data preparation get in the way of your analytics” about how leaving data preparation until the end will kill your analytics, and why cloud-based data refinement services are a game changer for data science professionals.

So where does IBM DataWorks fit in?

DataWorks is a self-service data preparation and movement solution that enables business users to load data from multiple sources, transform it and deliver it to multiple targets. It automates tedious and time-consuming tasks to shape, format and get data ready, and it enables data professionals to preview their data, cleanse it and deliver it for downstream analytics. And because the service sits on IBM Bluemix, it’s easy and seamless to push the cleansed and profiled data straight into analytics services such as dashDB and Watson Analytics.

DataWorks provides a user-friendly spreadsheet-like interface that empowers data professionals to find the data they need, get it into the format they want, and deliver to their preferred analytics tools.

By giving data professionals fast, self-service access to relevant and easily consumable data, DataWorks cuts out the middleman and reduces time to insight.

Because we’re open to a large number of sources and targets, DataWorks also plays perfectly in the hybrid cloud/on-premises scenario I outlined earlier, in which you’re trying to combine sales data with weather data. DataWorks provides not only connectivity to the most common and widely used data sources on the cloud but also secure gateway technology to reach into on-premises data behind the firewall. It enables data professionals to join data from multiple sources, assess its quality, filter out unwanted, low-quality or NULL sets, apply functions like string transformations or unit conversions, and sort the data into the right format for their downstream analysis or application. DataWorks then transforms the data according to the defined actions and pushes it to the next stage. Read more about it in “IBM DataWorks: Smarter data preparation for the next generation of analytics“.

How will DataWorks change the way businesses go about data preparation?

The old tactical approach of having data professionals make iterative requests to the IT department is no longer fit for purpose: neither side can afford all the effort and delay that involves. Anecdotally, the classic approach results in 80 percent of time spent on data preparation versus just 20 percent on analytics. The usual work-around in the past was for the business to go behind IT’s back and build its own silos. This “shadow IT” is a bad idea in terms of data governance and security, as well as being hugely inefficient. Equally, data preparation shouldn’t be an after-thought: if handled appropriately from the beginning, it can yield richer insights.

DataWorks UI
Sample interface of DataWorks service

The DataWorks services enables a new strategic approach to data preparation, acting as a central point of control that can be managed by the business and supervised by IT. DataWorks allows the business to go at the speed it wants without compromising vital governance and security around enterprise data. And it frees up IT staff from a huge number of time-consuming, low-value and frankly boring data preparation tasks.

Our tagline is that DataWorks allows you to ACT on your data, which stands for:

  • Access your data, both on-premises and on the cloud
  • Clean your data, to resolve any mismatches when you combine different sources
  • Transform your data, to get it into the format you need for your applications or downstream analytics.

Of course, IBM is not the only player in the data preparation space, but we are the only one who can reach your data across different clouds, be it Amazon AWS or Microsoft Azure. We also have the advantage in terms of our ability to provide end-to-end services to a business, helping you every step of the way from sourcing data to using our full ecosystem of analytical and development tools on the cloud. We’re also evolving very fast and continually adding new tools to make it faster and easier for knowledge users to transform their data… but that’s a topic for another time…

OK, we’ll look forward to hearing more! What’s the next step if I want to dig deeper right away?

Check out the DataWorks page, try this tutorial, or jump right in with the technology on IBM Bluemix today.

More Community stories
April 30, 2019

Introducing IBM Analytics Engine v1.2 and Announcing the Deprecation of IBM Analytics Engine v1.0

We are excited to inform you about the new version of IBM Analytics Engine v1.2 that will be available starting May 15, 2019. Along with this release, Analytics Engine v1.0 will be retired.

Continue reading

April 16, 2019

Announcing the Deprecation of the Decision Optimization Beta Service

The End of Beta date for the Decision Optimization service is May 17, 2019. The End of Beta Support date is June 20, 2019.

Continue reading

April 2, 2019

Data Refinery and Profiling Changes in Watson Studio and Watson Knowledge Catalog

We'd like to announce data refinery and profiling changes related to Watson Studio and Watson Knowledge Catalog that will take effect on May 17, 2019.

Continue reading