AI governance is the ability to monitor and manage AI activities within an organization. It includes processes and procedures to trace and document the origin of data and models deployed within the enterprise; as well as the techniques used to train, validate, and monitor the continuing accuracy of models. Effective AI governance delivers three main outcomes for enterprises:
Compliance. Helping to make sure that AI solutions and AI delivered decisions are consistent with industry accepted practices, regulatory standards, and legal requirements.
Trust. Enabling trust in AI delivered decisions by helping to ensure that AI models are explainable, and fair.
Efficiency. Improving speed-to-market and reducing AI development costs by standardizing and optimizing AI development and deployment practices.
Enterprises that do not adopt AI governance risk multiple negative consequences. The machine learning process is iterative and requires collaboration. Without good governance and documentation, data scientists or validators can't be sure of the lineage of a model's data or how the model was built. Leading to results can be challenging to reproduce. If administrators train a model using wrong or incomplete data–months of work could be destroyed.
Lack of AI governance can also result in significant penalties. Bank operators have been issued seven-figure fines for using biased models when determining loan eligibility. The EU plans to add AI regulations to the General Data Protection Regulation (GDPR). GDPR infringements currently can “result in a fine of up to €20 million, or 4% of the firm's worldwide annual revenue from the preceding financial year, whichever amount is higher.”
Brand reputation is also at risk. One experiment used AI software to learn the speech patterns of young people on social media. Administrative officials removed the software quickly after internet trolls “taught” the tool to create racist, sexist, and anti-Semitic posts.
The diagram above shows the major components of an AI governance solution for a generative AI solution using a large language model (LLM).
Model Governance is the central clearinghouse for AI governance. It provides dashboards, reports, and alerting capabilities used by the enterprise's staff to ensure, audit, and report that AI models are meeting requirements for fairness, transparency, and compliance. The Model Governance component is also enables enterprises to set gating criteria and other policies that affect when and how models move from development into production.
Model Monitoring actively monitors the output of models to ensure that models are explainable, fair, and compliant with regulations, and remain so when they are deployed. If models begin to drift or exhibit bias in their outputs, the Model Monitoring component flags them for investigation by AI operations staff.
The diagram below walks through the high-level component interactions to deliver on enterprise AI governance.
The diagram above shows the major components of an AI governance solution for a generative AI solution using a large language model (LLM).
Model Governance is the central clearinghouse for AI governance. It provides dashboards, reports, and alerting capabilities used by the enterprise's staff to ensure, audit, and report that AI models are meeting requirements for fairness, transparency, and compliance. The Model Governance component is also enables enterprises to set gating criteria and other policies that affect when and how models move from development into production.
Model Monitoring actively monitors the output of models to ensure that models are explainable, fair, and compliant with regulations, and remain so when they are deployed. If models begin to drift or exhibit bias in their outputs, the Model Monitoring component flags them for investigation by AI operations staff.
The diagram below walks through the high-level component interactions to deliver on enterprise AI governance.
Members of the enterprise governance team use the Model Governance component to (i) visualize the AI models (foundation and non-foundation) deployed within the enterprise across private infrastructure, hyperscalers, and cloud-based platforms, and (ii) set minimum operating criteria and other policies for models to be deployed and operated within the enterprise. The criteria and policy controls are propagated to the Model Monitoring component for subsequent monitoring and alerting.
A Model Developer prompt tunes a large language model (LLM) and evaluates the model response to test prompts. The results of these tests, along with summary statistics, are captured and propagated to the Model Monitoring component where they are recorded in order to provide model and data lineage.
A Model Validator reviews the results of the tuning and testing and, with the assistance of the Model Monitoring component, compares them against the gating criteria and controls set by the enterprise governance team. Once the criteria and controls are met the model is approved for use in production.
A Model Developer uses the Model Monitoring component to monitor the model's performance over time; specifically, the Developer looks to ensure that model responses continue to meet the enterprise criteria for fairness (lack of bias), accuracy (correct responses), and transparency (explainable responses).
The Model Monitoring component continuously monitors deployed AI models (foundation / generative models as well as 'traditional' machine learning models) to capture accuracy and performance statistics.
The Model Monitoring component also captures user prompts and the model's responses to (i) further guard against model drift (deviations in bias and/or model accuracy), and (ii) to capture test data and help to identify topic areas or data domains where additionally tuning will be beneficial.
The current mapping of the IBM watsonx.governance, and IBM OpenPages solutions to the conceptual model is shown in the diagram below. Foundation models running on watsonx.ai platform, on-premises, on cloud-based infrastructure, or third-party AI platforms such as Amazon Sagemaker, are monitored at run time by watsonx.governance. watsonx.governance also provides capabilities to create, update, and manage model cards, known as AI Factsheets within watsonx.governance, and capture and report on model performance metrics. IBM OpenPages' Model Risk Management module provides the risk reporting and management capabilities, and the model development and deployment policy management capabilities Model Governance.
Governance of generative AI solutions is similar to the governance of 'traditional' AI models but their generative capabilities require closer management of model inputs and model outputs than traditional models to guard against inappropriate or malicious prompts, and to ensure that the models are producing factually correct and acceptable outputs. This section illustrates how IBM watsonx.governance is applied to foundation models in two core use cases: model life cycle management, and model risk and regulatory compliance.
The diagram above illustrates how watsonx.governance is used to manage the model life cycle from initial testing and validation through to deployment.
A Model Developer prompt tunes a model in the watsonx.ai on-premise solution, the watsonx.ai service, or on another on-premise or cloud-based platform and develops and test prompts against it.
Prompts and model response data, along with model performance metrics such as ROUGE, SARI, Chrf, and BLEU, are captured in watsonx.governance's model inventory management functionality. Multiple versions of the prompts and response data are captured to enable cross-comparison and selection of a model and prompt combination that best meets the enterprise's requirements.
A Model Validator reviews the results of the individual prompt and model combinations and selects an version to approve for deployment to production.
Model Developers use the same capability to track model / prompt combinations and their performance for specific business use cases.
The component walkthrough for model risk and regulatory compliance within watsonx.governance is shown below.
Members of the enterprise's AI governance team determine and set criteria, specified as minimum, maximum, and allowed variances of model metrics such as ROUGE, that must be met by models in production. These criteria are set within the IBM OpenPages Model Risk Management tool and then subsequently propagated to watsonx.governance.
A Model Developer prompt tunes and develops test prompts against a foundation model deployed within the watsonx.ai on-premise solution, the watsonx.ai service, or on another on-premise or cloud-based solution such as Sagemaker.
Prompt information and model response data, along with model performance metrics, are propagated to watsonx.governance where the metrics are compared against the thresholds set by the governance team.
The results of the metrics comparison is propagated to IBM OpenPages for review and reporting by the governance team. Specifically, if the prompt / model combination meets all of the set criteria it may be flagged as ready for production, or as having no risks. If the model meets only some of the criteria it may be flagged as potentially under-performing and not yet suitable for production depending on how strict the governance team has made the policy.
Ensure mechanism to operationalize AI with Confidence. It is critical to evaluate the model during development and deployment that response from LLM are not a result of hallucinating and devoid of any hate profanity words. Ensure the LLM responses are explainable, ethical, trusted and unbiased. The quality metrics for LLM are quite different than traditional AI models having ability for the data scientist to pick the right metrics consistently.
Deployed Generative AI Solutions need to be consistent without any bias or drift introduced over time. It is not uncommon to see an Enterprise using variety of LLM across various clouds, enabling centralized governance across the board is critical. Having a governance approach across various deployment environments on multiple clouds is key consideration.
Ensuring they deployed Generative AI applications are current and adhering with constantly evolving industry regulations. Getting visibility of all enterprise deployed model and health in a single view.
Ensure no Hate Abuse Profanity is used in training data. Also being able to indemnify the enterprise from any proprietary data usage while ensuring no PII or IP data is leaked. Being able to audit and get data lineage for Generative AI solution is key.
This describes how a RAG model is deployed end to end with monitoring and governance capability throughout the life cycle. Along with Model Governance, Data Governance is also important. We show how leveraging IBM watsonx.governance components like AI OpenScale, FactSheets and IBM Open Pages we can ensure Generative AI applications are managed and governed. IBM Watson Knowledge Catalog enables proper data management including cataloging data, data lineage, PII data management.
IBM's Generative AI Architecture is the complete IBM Generative AI Architecture in IBM IT Architect Assistant (IIAA), an architecture development and management tool. Using IIAA, architects can elaborate and customize the architecture to create their own generative AI solutions.