April 7, 2020
Share this post:
Authors: Elaine Hanley, DataOps Centre of Excellence Worldwide Lead at IBM & Julie Lockner, IBM Data and AI Portfolio Operations, Customer Experience and Offering Management
At the Gartner Data and Analytics Summit in Sydney earlier this year, IBM hosted a design thinking workshop on the emerging topic of DataOps. This is the second blog in a four-part series describing the workshop – read the first entry to learn how we helped participants assess their DataOps maturity.
IBM provides a path to a DataOps practice with a prescriptive methodology, artificial intelligence (AI)-enabled automation and the IBM DataOps Center of Excellence. For example, adopting a DataOps practice allowed one retailer to make improvements across its data pipelines so that data changes took less than two minutes to be applied across the organisation, after previously taking three weeks. As a result, the retailer leveraged business-ready data to conduct customer affinity analysis in less than one day – a process which previously took 20 days. Furthermore, it reduced the amount of time needed to report on inventory stock positions by one-sixth.
In this blog, we turn our attention to Data Inventory, which involves the discovery of the data that is available to an organisation – whether their own internal systems or external data, such as demographic or weather data.
Exploring data challenges and triumphs
At the workshop, we asked participants to list their challenges, their successes, and any questions they had on this process of discovery.
Among the challenges expressed by the group were:
These challenges are not uncommon. Beginning a data governance journey can seem daunting given the volume of data organisations have collected, and our customers are often reluctant to introduce a new process to already overburdened teams.
When business leaders kick off new initiatives, having the knowledge to baseline and measure progress is critical. If teams don’t have access to data, time and resources are wasted, which places new initiatives at risk of missing the opportunity to achieve the promised benefits in the timeframes expected.
IBM DataOps includes measurement and instrumentation as a foundation. For Data Inventory, the starting point of any data sprint is the set of data elements required, expressed in business language. An ability to track progress at any point in time becomes the foundation for building a business case to implement a data catalogue and a business glossary of terms. This transforms a sea of technical data into a searchable inventory of information, including the bridge between business language and technical metadata that addresses many of the communication gaps between lines of business and IT.
When we asked about what achievements the attendees made in their DataOps journey, the answers support many of the best practices in place for IBM customers. It’s clear that data catalogues, business glossaries, senior sponsorship, automation and collaboration are critical.
Most tools in the market today offer some automated capabilities, but IBM’s Watson Knowledge Catalog is a robust, fully functional catalogue that embeds machine learning algorithms to automate discovery, classification and business term assignment. A free lightweight version is available on IBM Cloud. It’s also an integral part of IBM Cloud Pak for Data, which can be sampled with Cloud Pak for Data Experiences.
IBM Watson Knowledge Catalog provides more than just an inventory to capture information about data sets – it’s a place where users can search and actually preview data so they can decide if it’ll meet their needs. This means more users can use the catalogue effectively to support their data-driven needs and offer comments and feedback if the data sets were useful.
Turning the technical into the useable
Transforming a set of technical database connections and table and column names to a business-friendly understanding of the meaning of the information are supported by linking the technical assets to business terms. IBM Industry Models provide industry-specific business glossaries that can accelerate the creation of a business lexicon.
A team can begin by implementing a project-specific catalogue and eventually grow the number of data assets it manages over time. This approach addresses any concerns over huge numbers of data sets. As users provide feedback to the data sets’ subject matter experts, adoption grows.
For Data Inventory, participants answered a number of questions, such as:
- How do we avoid ‘shelfware’?
IBM DataOps facilitates the faster realisation of business value. A persona-driven collaborative platform provides the environment for better communication and streamlined workflow towards the Data Sprint goal. IBM has witnessed rapid adoption and usage when teams use DataOps to incorporate a data catalogue into business analytics and data science processes. Process changes culture, and a process that encourages collaboration – connecting consumers of data to data owners – becomes valuable to all stakeholders.
- How do you get everyone on board?
Begin with a small, focused Data Sprint, with KPIs that can measure success. Demonstrate the success to others and use it to grow adoption. Measure the baseline before you begin, and ensure you can track KPIs throughout.
- How do you convince the business there’s value / ROI on the initiative that’s generally intangible?
There are several ways to develop an ROI for DataOps – it begins with measuring how long it currently takes teams to find and deliver data for a new business initiative, and what level of issues already exist with the data they’re using. What would it cost the business – in wasted opportunities for new products, reputational damage and/or regulatory fines – if this data wasn’t available for six months? Now consider the revenue that could be generated if delivery was accelerated to one month.
- How do you progress from the collection of data intent to actually store the data in the right architecture?
IBM believes data can be inventoried and ‘tagged’ without ever moving it. Technology such as data virtualisation makes it possible to view the data where it resides. As the use of data becomes more frequent and complex, data architects can propose a more optimal architecture. IBM’s Data Virtualization and Watson Knowledge Catalog are tightly integrated for this very reason.
- Is it best to keep it simple or ‘buy a tool’?
Most data inventory projects begin with spreadsheets and are eventually incentivised to move to a data catalogue to address concerns over sharing knowledge about data and its use, as well as enforcing the governance policies of their organisation and operating geographies – considerations which dictate the move away from spreadsheet surprisingly quickly. As all of IBM’s DataOps software allows access to capabilities through APIs, this facilitates automation of any DataOps processes through a Tool Chain.
- How do you capture information at scale to handle the sheer number of databases and systems across the organisation?
Automation is key. Each Data Sprint can use the discovery processes in IBM DataOps software to recognise content at scale – and with Machine Learning built-in, the discovery improves with time and application.
Read more about IBM DataOps
Feel free to register your interest in IBM DataOps and to request an IBM DataOps Design Thinking Virtual Workshop for your organisation, by emailing the IBM DataOps team.