Foundations of trustworthy AI: Operationalizing trustworthy AI
Industry focus on trustworthy AI is being driven by several forces — corporate social responsibility posture, concerns around reputational risk and a growing set of regulations. Organizations recognize that they need a systematic approach to ensure AI and machine learning can be trusted and operationalized. Aspects of trustworthy AI include fairness, robustness, privacy, explainability and transparency. Use case patterns include conducting health checks of existing applications and creating an organization-wide framework for AI governance, but today, we will focus on operationalizing new applications.
Organizations need a systematic approach to build and operationalize new AI applications in a trustworthy manner. This approach must take into consideration the end-to-end data science and AI lifecycle. The AI lifecycle consists of a sequence of stages that bring together various personas and best practices. (See the O’Reilly report Operationalizing AI for a general view.) The stages of the lifecycle include:
- Scope and plan
- Collect and organize
- Build and test
- Validate and deploy
- Monitor and manage
In an ideal scenario, aspects of trustworthy AI should be addressed at each of these stages and not just at the end. We need appropriate guardrails at each stage, from business definition, data exploration and model building to model validation, deployment, monitoring and ongoing management. Let us take a closer look at each stage and how various aspects of trustworthy AI are addressed.
Scope and plan
This stage guides the prioritization of use cases and the development of an AI action plan. The team — composed of a business stakeholder, data scientist, data owner, operations lead, and other roles — first focuses on the business use case, defining business value and specifying business KPIs. It then addresses the technical task, translating the business goal into specific AI tasks to solve. Finally, the team develops a structured action plan for a solution to the identified technical tasks in support of the business goal.
Applying Enterprise Design Thinking principles is a best practice for this stage. Design Thinking helps identify and define various business and technical aspects of bias/fairness, robustness, explainability etc. in the context of the use case. This stage helps answer questions such as:
- What are the business expectations for fairness or transparency?
- What regulation do we need to comply with?
- How do we get access to sensitive data attributes?
- What is the granularity and frequency at which explanations need to be provided?
It’s also important to take note of any organizational data and AI policies that the team will need to follow while building out these use cases, to prevent any last-minute challenges when trying to validate for production use.
Collect and organize
This stage allows the data consumer to find and access relevant datasets. Data science teams can “shop for data” in a central catalog using either metadata or business terms to search. They can understand the data, including its owner, lineage, relationship to other datasets, and so on. It’s important to provide data scientists a technical view of data lineage, so they can understand each data transformation that might impact how they create and use features.
Based on that exploration, they can request a data feed. Once approved, the datasets can be made available to the data science team in their data science development environment.
If the team needs to work with regulated personal data such as personally identifiable information (PII) or protected health information (PHI), the data steward or a data provider must ensure the data shared adheres to regulations through appropriate anonymization.
Build and test
Data science teams explore and prepare the data, and build, train and test their AI/ML models during this stage. The activities during this stage are best undertaken as a set of agile sprints. It is important that bias in data be checked at this stage even before any model building work starts. This is an inner guardrail for fairness, and such guardrails can be put in place during and after the model building steps. Model robustness, explainability and other aspects can be similarly accounted for and tested during this stage.
Once these tests complete successfully, an MLOps pipeline allows for the model and related assets to be moved from the development environment to a pre-production validation environment.
Validate and deploy
This stage involves validation of the model and deployment into production. Validation activities could be performed by a team other than the one that built the model, such as the organization’s model validation or model risk management team or an external entity.
This team validates quality, robustness, fairness, etc. and generates validation reports. It is important to capture the validation results for reference and comparison. Model factsheets, local and global explanations and other metrics are also checked.
If the model passes validation, the Ops team can promote it to the production environment using an MLOps pipeline. The model is deployed there, either for online or batch invocation.
Monitor and manage
In this stage, the Ops team sets up ongoing monitoring and management of the AI/ML model in production. The team configures monitors for periodic scheduled collection of metrics. Quality, robustness, and fairness are monitored at frequencies dictated by business needs. Monitoring for data and accuracy drift and generating explanations for selected transactions as well as for global behavior are examples of ongoing activities.
The business can choose to act in case the monitors detect that a threshold has been breached, ranging from alerts to corrective steps. Bias mitigation or model re-training steps can be configured to ensure continued trustworthy behavior. Models can be decommissioned based on either business or technical criteria. This stage establishes the outermost guardrails for trustworthy AI.
These AI lifecycle stages fit into an overall AI Governance framework for the enterprise. Such a framework allows multiple use cases to follow the lifecycle stages consistently, regardless of which development tools are used. It helps automate documentation across the lifecycle and provides a consistent view to various stakeholders.
Operationalizing trustworthy AI requires us to bring together people (expertise), process (best practices), and platform (technology). IBM Cloud Pak for Data and IBM Spectrum Fusion provide the technology framework that supports various stages of the end-to-end AI lifecycle. It can fit into existing environments and complement existing model development and deployment tools. The platform can run on a wide variety of Cloud (IBM, AWS, Azure, GCP, etc.) and on-premises infrastructure choices, providing for a true hybrid cloud capability.
IBM provides a set of service offerings that bring together education, expertise, best practices, and technology to help customers get started with operationalizing trustworthy AI.