Factsheets for AI Services

Share this post:

Concerns about safety, transparency, and bias in AI are widespread, and it is easy to see how they erode trust in these systems. Part of the problem is a lack of standard practices to document how an AI service was created, tested, trained, deployed, and evaluated; how it should operate; and how it should (and should not) be used.

To address this need, my colleagues and I recently proposed the concept of factsheets for AI services. In our paper [1], we argue that a Supplier’s Declaration of Conformity (SDoC, or factsheet, for short) be completed and voluntarily released by AI service developers and providers to increase the transparency of their services and engender trust in them. Like nutrition labels for foods or information sheets for appliances, factsheets for AI services would provide information about the product’s important characteristics. Standardizing and publicizing this information is key to building trust in AI services across the industry.

Trust in AI services

The issue of trust in AI is top of mind for IBM and many other technology developers and providers. AI-powered systems hold enormous potential to transform the way we live and work but also exhibit some vulnerabilities, such as exposure to bias, lack of explainability, and susceptibility to adversarial attacks. These issues must be addressed in order for AI services to be trusted. At IBM Research we are confronting these issues head-on, with a scientific approach to engineer AI systems for trust. We are developing techniques and algorithms to assess—and address—the foundational elements of trust for AI systems: tools that discover and mitigate bias, expose vulnerabilities, defuse attacks, and unmask the decision-making process.

Pillars of trusted AI

We believe several elements or pillars form the basis for trusted AI systems.

  • Fairness: AI systems should use training data and models that are free of bias, to avoid unfair treatment of certain groups.
  • Robustness: AI systems should be safe and secure, not vulnerable to tampering or compromising the data they are trained on.
  • Explainability: AI systems should provide decisions or suggestions that can be understood by their users and developers.
  • Lineage: AI systems should include details of their development, deployment, and maintenance so they can be audited throughout their lifecycle.

Just like a physical structure, trust can’t be built on one pillar alone. If an AI system is fair but can’t resist attack, it won’t be trusted. If it’s secure but we can’t understand its output, it won’t be trusted. To build AI systems that are truly trusted, we need to strengthen all the pillars together. Our comprehensive research and product strategy is designed to do just that, advancing on all fronts to lift the mantle of trust into place.

Factsheets for AI services

Fairness, safety, reliability, explainability, robustness, accountability—we all agree that they are critical. Yet, to achieve trust in AI, making progress on these issues will not be enough; it must be accompanied with the ability to measure and communicate the performance levels of a system on each of these dimensions. One way to accomplish this would be to provide such information via SDoCs or factsheets for AI services. Similar work has begun for datasets [2,3,4], and the SDoC concept expands this to cover all aspects of AI services. Our paper includes initial suggestions, covering information about system operation, training data, underlying algorithms, test set-up and results, performance benchmarks, fairness and robustness checks, intended uses, and maintenance and re-training. Sample questions from a factsheet might include:

  • Does the dataset used to train the service have a datasheet or data statement?
  • Was the dataset and model checked for biases? If “yes” describe bias policies that were checked, bias checking methods, and results.
  • Was any bias mitigation performed on the dataset? If “yes” describe the mitigation method.
  • Are algorithm outputs explainable/interpretable? If yes, explain how is explainability achieved (e.g. directly explainable algorithm, local explainability, explanations via examples).
  • Who is the target user of the explanation (ML expert, domain expert, general consumer, regulator, etc.)
  • Was the service tested on any additional datasets? Do they have a datasheet or data statement?
  • Describe the testing methodology.
  • Was the service checked for robustness against adversarial attacks? If “yes” describe robustness policies that were checked, checking methods, and results.
  • Is usage data from service operations retained/stored/kept?
  • What will be expected behavior if the input deviates from training/testing data?
  • What kind of governance is employed to track the overall workflow of data to AI service?

The questions are devised to aid users in understanding how the service works, determining whether the service is appropriate for the application they are considering, and comprehending its strengths and limitations.

All together now

Understanding and evaluating AI systems is an issue of utmost importance for the AI community, an issue we believe the industry, academia, and AI practitioners should be working on together. We invite you to join us. As a next step, we will be asking the community to weigh in on what information would be useful in assessing AI services. We welcome your collaboration in developing and refining the AI factsheets concept, thereby ushering the era of trusted AI systems and bootstrapping their broader adoption.

[1] Increasing Trust in AI Services through Supplier’s Declarations of Conformity (submitted to Conference on Fairness, Accountability, and Transparency, FAT* 2019)
[2] Datasheets for Datasets
[3] Data Statements for NLP: Toward Mitigating System Bias and Enabling Better Science
[4] The Dataset Nutrition Label: A Framework To Drive Higher Data Quality Standards

IBM Fellow, AI Science, IBM Research

More AI stories

We’ve moved! The IBM Research blog has a new home

In an effort better integrate the IBM Research blog with the IBM Research web experience, we have migrated to a new landing page:

Continue reading

Pushing the boundaries of human-AI interaction at IUI 2021

At the 2021 virtual edition of the ACM International Conference on Intelligent User Interfaces (IUI), researchers at IBM will present five full papers, two workshop papers, and two demos.

Continue reading

From HPC Consortium’s success to National Strategic Computing Reserve

Founded in March 2020 just as the pandemic’s wave was starting to wash over the world, the Consortium has brought together 43 members with supercomputing resources. Private and public enterprises, academia, government and technology companies, many of whom are typically rivals. “It is simply unprecedented,” said Dario Gil, Senior Vice President and Director of IBM Research, one of the founding organizations. “The outcomes we’ve achieved, the lessons we’ve learned, and the next steps we have to pursue are all the result of the collective efforts of these Consortium’s community.” The next step? Creating the National Strategic Computing Reserve to help the world be better prepared for future global emergencies.

Continue reading