Generative AI has altered the tech industry by introducing new data risks, such as sensitive data leakage through large language models (LLMs), and driving an increase in requirements from regulatory bodies and governments. To navigate this environment successfully, it is important for organizations to look at the core principles of data management. And ensure that they are using a sound approach to augment large language models with enterprise/non-public data.

A good place to start is refreshing the way organizations govern data, particularly as it pertains to its usage in generative AI solutions. For example:

  • Validating and creating data protection capabilities: Data platforms must be prepped for higher levels of protection and monitoring. This requires traditional capabilities like encryption, anonymization and tokenization, but also creating capabilities to automatically classify data (sensitivity, taxonomy alignment) by using machine learning. Data discovery and cataloging tools can assist but should be augmented to make the classification specific to the organization’s understanding of its own data. This allows organizations to effectively apply new policies and bridge the gap between conceptual understandings of data and the reality of how data solutions have been implemented.
  • Improving controls, auditability and oversight: Data access, usage and third-party engagement with enterprise data requires new designs with existing solutions. For example,  capture a portion of the requirements that are needed to ensure authorized usage of the data. But firms need complete audit trails and monitoring systems. This is to track how data is used, when data is modified, and if data is shared through third-party interactions for both gen AI and non-gen AI solutions. It is no longer sufficient to control data by restricting access to it, and we should also track the use cases for which data is accessed and applied within analytical and operational solutions. Automated alerts and reporting of improper access and usage (measured by query analysis, data exfiltration and network movement) should be developed by infrastructure and data governance teams and reviewed regularly to proactively ensure compliance.
  • Preparing data for gen AI: There is a departure from traditional data management patterns and skills which requires new discipline to ensure the quality, accuracy and relevance of data for training and augmenting language models for AI use. With vector databases becoming commonplace in the gen AI domain, data governance must be enhanced to account for non-traditional data management platforms. This is to ensure that the same governance practices are applied to these new architectural components. Data lineage becomes even more important as the need to provide “Explainability” in models is required by regulatory bodies.

Enterprise data is often complex, diverse and scattered across various repositories, making it difficult to integrate into gen AI solutions. This complexity is compounded by the need to ensure regulatory compliance, mitigate risk, and address skill gaps in data integration and retrieval-augmented generation (RAG) patterns. Moreover, data is often an afterthought in the design and deployment of gen AI solutions, leading to inefficiencies and inconsistencies.

Unlocking the full potential of enterprise data for generative AI

At IBM, we have developed an approach to solving these data challenges. The IBM gen AI data ingestion factory, a managed service designed to address AI’s “data problem” and unlock the full potential of enterprise data for gen AI. Our predefined architecture and code blueprints that can be deployed as a managed service simplify and accelerate the process of integrating enterprise data into gen AI solutions. We approach this problem with data management in mind, preparing data for governance, risk and compliance from the outset. 

Our core capabilities include:

  • Scalable data ingestion: Re-usable services to scale data ingestion and RAG across gen AI use cases and solutions, with optimized chunking and embedding patterns.
  • Regulatory and compliance: Data is prepared for gen AI usage that meets current and future regulations, helping companies meet compliance requirements with market regulations focused on generative AI.
  • Data privacy management: Long-form text can be anonymized as it is discovered, reducing risk and ensuring data privacy.

The service is AI and data platform agnostic, allowing for deployment anywhere, and it offers customization to client environments and use cases. By using the IBM® gen AI data ingestion factory, enterprises can achieve several key outcomes, including:

  • Reducing time spent on data integration: A managed service that reduces the time and effort required to solve for AI’s “data problem”. For example, using a repeatable process for “chunking” and “embedding” data so that it does not require development efforts for each new gen AI use case.
  • Compliant data usage: Helping to comply with data usage regulations focused on gen AI applications deployed by the enterprise. For example, ensuring data that is sourced in RAG patterns is approved for enterprise usage in gen AI solutions.
  • Mitigating risk: Reducing risk associated with data used in gen AI solutions. For example, providing transparent results into what data was sourced to produce an output from a model reduces model risk and time spent proving to regulators how information was sourced.
  • Consistent and reproducible results: Delivering consistent and reproducible results from LLMs and gen AI solutions. For example, capturing lineage and comparing outputs (that is, data generated) over time to report on consistency through standard metrics such as ROUGE and BLEU.

Navigating the complexities of data risk requires a cross-functional expertise. Our team of former regulators, industry leaders and technology experts at IBM Consulting® are uniquely positioned to address this with our consulting services and solutions. 

Please see more on our following capabilities and reach out to me at gsbaird@us.ibm.com for any further questions.

Learn more about how AI governance can help fight data risks
Was this article helpful?
YesNo

More from Artificial intelligence

Generative AI isn’t just about technology; it’s about business transformation

4 min read - We are now at a crucial stage in our evolution with enterprise generative AI. While consumer generative AI has captured the imagination of millions, executives are developing the practices that can deliver an effective and responsible strategy for enterprise generative AI. According to our CEO survey, 60% of organizations are not yet developing a consistent, enterprise-wide approach to generative AI. IBM® is in a unique position to help. The company has a rich history of innovation, combined with a deep…

New IBV study: AI drives mainframe innovation

3 min read - To better understand how IT leaders are leveraging mainframes today and envisioning their future in the era of AI and hybrid cloud, the IBM Institute for Business Value, (IBV) in collaboration with Oxford Economics, conducted a survey of 2,551 global IT executives. The findings show the mainframe is already playing a pivotal role in supporting AI innovation, hybrid cloud strategies, and acceleration of digital transformation. With unmatched security and processing capabilities, the mainframe powers 70% of global transactions, on a…

Voice AI surge: How talking tech could reshape business

3 min read - Voice AI technology is rapidly evolving, promising to transform enterprise operations from customer service to internal communications. In the last few weeks, OpenAI has launched new tools to simplify the creation of AI voice assistants and expanded its Advanced Voice Mode to more paying customers. Microsoft has updated its Copilot AI with enhanced voice capabilities and reasoning features, while Meta has introduced voice AI to its messaging apps. According to IBM Distinguished Engineer Chris Hay, these advances "could change how…

IBM Newsletters

Get our newsletters and topic updates that deliver the latest thought leadership and insights on emerging trends.
Subscribe now More newsletters