Three data governance best practices for cloud deployment with IBM Watson Knowledge Catalog

By | 4 minute read | September 25, 2020

Chatter about the cloud is everywhere, but not for the reasons you might think. The previous debate about whether cloud would take off amongst enterprise IT is long gone; cloud is the new normal. The conversation today focuses on the complexities that arise when you lift the hood on what cloud actually means.

A recent O’Reilly survey demonstrated that 88% of respondents use cloud in some form. While public cloud seems to be the dominant deployment option, nearly half (49%) continue to run applications in traditional, on-premises contexts. The reality is that most organizations use multicloud and on-premises applications that co-exist. We live in a multicloud world where organizations must determine which cloud platforms — public, dedicated private, managed — is the best for different workload types.

Three best practices for any cloud

How can organizations realize the financial benefits of the cloud and ensure information extracted from cloud sources is secure and trustworthy? The answer is data governance. Good multicloud data governance implies several priorities for IT and the business to incorporate into their strategies.

Practice #1: Agree on what information means: Terminology and metadata management

One of the most important business requirements for organizations today is the ability to understand and completely trust their information. But in many organizations, business and data analysts often use multiple definitions for a given business term. Even worse, all of those definitions may be correct, depending on context and usage. This problem is amplified when third-party cloud-based data sources join the mix. A lack of understanding of the business context of third-party data can negatively impact analysis.

The dilemma of business definition ambiguity and inconsistency is often attributed to the absence of an enterprise-wide business glossary and stewardship program, which is often part of a larger metadata strategy plan. A metadata strategy comprises two elements: technical metadata and business metadata. Technical metadata describes the shape, size and format of data, content, business processes, services, business rules and policies. Business metadata describes the business context for those assets. Linking business metadata to technical metadata through a common metadata repository facilitates collaboration and better communication between business and technical users.

In the news: IBM® Watson® Knowledge Catalog is named on the Constellation ShortList for Metadata Management, Data Cataloging and Data Governance for Q3 2020. Read the report.

Practice #2: Maintain and monitor assets: Quality and stewardship

The proliferation of data sources amplifies the classic “garbage-in, garbage-out” problem. But the data explosion is not limited to structured data; most of the added volume flows from unstructured sources, like email, images and documents. Missing, inaccurate or incomplete information can generate high costs and reduce productivity when workers must search for information or reconcile data.

An organization must be able to manage its supply chain of information, and then integrate and analyze it to make business decisions. Unlike a traditional supply chain, a data supply chain has a many-to-many relationship. For example, data about the same person can come from many places — the same person may be a customer, an employee and a partner — and the information can end up in many reports and applications. In addition, various systems may define the same information differently.

Effective data governance can enhance the quality, availability and integrity of an organization’s data by fostering cross-organizational collaboration and structured policy making. Governance balances functional silos with enterprise-level oversight, directly affecting four factors critical to an organization: increasing revenue, lowering costs, reducing risk and boosting confidence.

In the news: IBM is a Leader in the 2020 Gartner Magic Quadrant for Data Quality Solutions. Read the report.

Practice #3: Secure information assets: Privacy and compliance

To protect data in multicloud environments, organizations must understand what data is going into these environments, how access can be monitored and the types of vulnerabilities that may threaten it. Protections should be built into hybrid environments from the start. Active policy management and dynamic masking of sensitive data help protect data to ensure compliance and audit-readiness, but most importantly, maintain trust across enterprise data.

Organizations using AI models to automate their business processes are on the rise. At the heart of these models lie metadata and data — once again underlining how critical governance is for future innovation. The risk of duplicating sensitive information increases with the complexity of the workload, such as AI models that can run exponentially. If your company incorporates best practices at the foundation of your data strategy now, new transformative initiatives will be simpler to begin and understand as your organization’s maturity builds.

Go hands on: Fuel data innovation and AI with IBM Watson Knowledge Catalog. Start the guided demo.

Learn more

Now is the best time to implement data governance across your organization’s multicloud environment. IBM is committed to helping clients deliver on this need at scale. Clients can trust and govern millions of data assets with IBM Watson Knowledge Catalog, an open and intelligent data catalog for enterprise data and AI model governance, quality and collaboration.

Delivering business-ready data to feed analytics and AI projects begins with a data catalog that can automate organization, provide consistent definitions and enable self-service management of enterprise data. By providing an end-to-end experience rooted in metadata and active policy management, Watson Knowledge Catalog can help you successfully navigate a multicloud journey.

Cloud Pak for Data is now available for deployment across two cloud-native opportunities: IBM Cloud Pak® for Data, unmanaged software for any cloud, and IBM Cloud Pak® for Data as a Service, fully managed on IBM Cloud®.

Learn more about Watson Knowledge Catalog

Related reading: Making IBM Cloud Pak for Data more accessible—as a service