Foundations of trustworthy AI: How to conduct trustworthy AI assessment and mitigation

By | 4 minute read | June 3, 2021

Risk assessment

The ability of artificial intelligence to perform important business tasks has grown by leaps and bounds in recent years. As AI has progressed from a proof of concept to powering critical enterprise workflows, it has become increasingly apparent that this general-purpose technology must be assessed in precise context for privacy, robustness, fairness, and explainability. These four assessments, along with transparency to stakeholders, constitute the five pillars of trustworthiness. If issues are discovered, they must be mitigated before serious harms occur. What are these pillars, how are they assessed, and how are they mitigated?

Five pillars of trustworthy AI

Let’s try to understand the pillars of trustworthy AI using a home mortgage approval application as an example.

Privacy is the idea that personal sensitive information should neither be disclosed inadvertently nor when a system is breached by a malicious actor. Data privacy has been studied and regulated for some time, but there are some nuances with AI in the mix. A historical dataset of home mortgage decisions might be protected against the disclosure of sensitive information such as the income of applicants. But using it to train an AI system may open up the sensitive information to inference by a user intelligently querying the AI.

Robustness is the ability of an AI system to remain accurate in different settings and conditions, including naturally occurring conditions and those set up by malicious actors to fool the AI. A robust AI mortgage model will not completely fall apart at the outset of a major change in the world, such as a global pandemic.

Fairness ensures that an AI system does not yield systematic advantages to certain privileged groups and individuals (defined by characteristics such as gender and national origin) and systematic disadvantages to certain unprivileged groups and individuals. The mortgage approval model should not systematically favor any race, ethnicity or gender.

Explainability allows people to understand how (typically opaque) AI systems make their decisions. Loan officers, applicants, and regulators can all make sense of an explainable AI system, each toward their own goals.

Transparency is achieved when the various assessments along with their justifications are documented and presented to stakeholders. Factsheets containing assessments of accuracy, privacy, robustness, fairness, and explainability of the mortgage approval model may be generated for model risk managers, regulators, and the general public.

Trustworthy AI assessment

Now that you know the pillars of trustworthy AI, how, when, and why do you assess them? Let’s start with the “why” using an analogy of inspecting the safety and functionality of a house along many dimensions (electrical, structural, plumbing, etc.). There are many reasons to inspect a house. The government inspects a house before issuing a certificate of occupancy. An owner inspects a house for peace of mind and to identify areas of improvement. An insurance company inspects a house to set its premium. A potential buyer inspects a house to be assured of what they are getting. An external party may surreptitiously inspect a house for evidence of wrongdoing.

AI is no different. It must be assessed across many dimensions by different parties (regulators, developers, customers, reinsurance companies, activists) for different reasons. You can call it AI testing, monitoring, assessing, or auditing, but the fundamental concept in all cases is to make sure the AI is performing well, both in typical conditions and when it is pushed to its limits. Sometimes you want to do this testing while the system is being built, sometimes as a validation step before deployment, sometimes continually during deployment, and sometimes after an adverse event has occurred. Both data (the raw materials) and the trained AI model (the finished product) should be tested.

There are two parts of AI testing: defining appropriate quantitative performance indicators and generating test examples to feed into the AI. The different pillars of trustworthy AI are now starting to have well-defined metrics, many of which are variations on accuracy measures. Even explainability, which should ideally be measured by polling a group of people, has quantitative proxy metrics. The biggest challenge in selecting appropriate indicators is that there is more than one metric per pillar, each with differing policy consequences. IBM Research is working on tools to help elicit relevant metrics, and IBM Services can run a garage session to help you figure some of this out.

Collecting inputs to test the AI in typical operating conditions is commonly done using data that you withhold from training. It can be done in IBM Cloud Pak for Data in the build, validate, and deployment stages. Generating test data that pushes the boundaries into unexpected conditions is a combination of art and science.

Trustworthy AI mitigation

Finally, if an assessment discovers that your AI system is not up to standard on privacy, robustness, fairness, or explainability, you’ll want to improve the system and mitigate the issue.

Despite their differences, the pillars of trustworthy AI have mitigation approaches grouped into the same three categories. The first category contains pre-processing methods that improve the statistics of the training dataset. The second category constrains the training of the AI in favorable ways. The third category post-processes the predictions produced by the AI.

Mitigation methods are full-fledged AI algorithms themselves. The main goal of these algorithms is to adapt the data to better match the desired world and to make the AI model perform as best as it can in the worst-case scenario.

IBM Cloud Pak for Data contains several mitigation algorithms for the different pillars of trust. IBM Research continues to develop advanced mitigation algorithms, which are available to customers in an early access program before they are integrated into Cloud Pak for Data.