November 2, 2017 | Written by: Seth Dobrin
Categorized: Data Governance | Machine Learning
Share this post:
Mastering fast-growing data volumes across the enterprise is one of the first and most critical steps in establishing a cognitive business. To do it requires adopting advanced analytics that enable an organization to better understand and control its data, but also to gain insights that set the stage for driving new business models.
But for all of this to effectively occur, the analytics must be executable regardless of the data type (structured, unstructured or semi-structured), and regardless of whether the data is sitting on critical systems that have been running the business for the last 20 years, a private cloud behind the firewall, or any of the public cloud platforms.
Additionally, it is increasingly important to leverage machine learning to help automate many of the processes such as metadata assignment, data classification, data quality and master data management.
These are the concepts of unified governance. For IBM, the essentials of data governance include diligent and comprehensive data management practices that provide data integrity, quality, security, usability, and availability. This past summer, we made a number of moves around governance, namely with the launch of our Unified Governance Platform, that aims to make governance more accessible.
But our approach to governance goes beyond accessibility. Our belief is that when done properly, data governance can be an enabler of rapid data science. In other words, if the data is “pre-governed,” it is much easier to access and understand what can and can’t be done with it. When that occurs, the data is primed for analysis, correlations, and patterns, all of which create a new environment of data-driven decision making.
This week we are announcing plans to extend our Unified Governance Platform with an array of new capabilities to advance the functionality even further. In our popular InfoSphere Information Server we added the ability to have a single view of the Unified Governance Catalog for both structured and unstructured information, such as, unstructured document files, PDF’s, etc. Providing such a single view is designed to make it easy for just about any user to find and understand their data across the enterprise. In addition, we re-crafted the look and feel of a key piece of the solution, Datastage Designer, with a cognitive design that actually recognizes and then suggests usage patterns. Such a capability will position users to speed their development of data integration flows.
We are also updating the operational and analytics capabilities of our Master Data Management (MDM) solution, which that gives organizations insights about the data that is typically stored – and potentially duplicated – in silo-ed applications, such as data about customers, suppliers, partners, products, materials, accounts, etc.
The new Analytical MDM provides self-service access that helps users visualize, explore and correlate with other data sources dynamically. We’re also updating MDM with something called, “consent management,” designed to help clients who are working to get their arms around the EU’s upcoming General Data Protection Regulation (GDPR). With consent management, users are able to view and manage the received consent, purpose of consent and other aspects of consent around data access, activities associated with the GDPR.
When organizations in specific industries want to get up and running with analytics and governance, they can turn to the IBM Industry Data Models. These sets of business and technical data models are pre-designed for specific industries, including finance, energy & utility, healthcare, banking process, insurance, telecommunications and retail. They provide an out-of-the-box framework and foundation to help accelerate the development of business intelligence applications around data that has already been identified.
This week, we are also announcing the availability of a number of GDPR-specific updates to the models, like support for GDPR domain-specific terms. We’ve accumulated an index of industry specific vocabulary that companies can now use to bridge from the language of the regulator to an enterprise wide set of terms. Such support can help companies streamline GDPR readiness. We’ve also added “consent management” to our Industry Models. Now, users can describe the consent agreement (including parental consent) between a data subject and the controller, clarifying the purpose of the data, and with whom such personal data may be used.
Metadata Governance & ODPi
In addition to all of this work, IBM has joined with Hortonworks and ING Group, and proposed a data governance Program Management Committee (PMC), to the Linux Foundation’s ODPi group. The aim of this group will be to create an open data governance ecosystem by defining interfaces for diverse metadata tools and catalogs to exchange information about data sources including where they are located, their origin and lineage, owner, structure, meaning, classification and quality — no matter where that data resides.
When eventually adopted, new tools that incorporate these interfaces from any vendor would be able to exchange information with catalogs that support these interfaces. This is in stark contrast to the way in which most metadata is managed today, which tends to be based on proprietary formats and APIs.
Because of the proprietary nature, existing tools typically support only a limited range of data sources and, as a result, governance actions. In addition, combining metadata to create an enterprise data catalog can be an extremely expensive proposition. In an ideal world, metadata should have the ability to be automatically moved with the data and be augmented and processed through open APIs for permitted usages. That’s the future promise of an open data governance ecosystem. More on the progress of this effort is expected to come in the following weeks.
Additionally, we are announcing significant updates to our IBM Watson Data Platform on IBM Cloud, including data cataloging and data refining offerings powered by machine learning. These moves will help companies to connect and secure all of their data across public cloud platforms, on-premises systems and third party streams, advancing their approaches to data governance and enabling them to build a strong data foundation that fuels the creation and deployment of AI apps.
As data volumes grow, so too will the need for Unified Data Governance platforms. IBM has been involved with data since the beginning, and our work to advance our governance capabilities is non-stop. We think you’ll agree.
Join IBM on February 27 to get serious with Machine Learning: https://www.ibm.com/mleverywhere