Lack of data transparency risk for AI

Governance Icon representing governance risks.
Governance
Non-technical risks
Amplified by generative AI
Amplified by synthetic data

Description

Lack of data transparency might be due to insufficient documentation of training or tuning dataset details, including synthetic data generation.  

Why is lack of data transparency a concern for foundation models?

Transparency is important for legal compliance and AI ethics. Information on the collection, generation and preparation of training data, including how it was labeled and by whom, and any synthetic data generation methods used, are necessary to understand model behavior and suitability. Details about how the data risks were determined, measured, and mitigated are important for evaluating both data and model trustworthiness. Missing details about the data might make it more difficult to evaluate representational harms, data ownership, provenance, and other data-oriented risks. The lack of standardized requirements might limit disclosure as organizations protect trade secrets and try to limit others from copying their models.

Parent topic: AI risk atlas

We provide examples covered by the press to help explain many of the foundation models' risks. Many of these events covered by the press are either still evolving or have been resolved, and referencing them can help the reader understand the potential risks and work toward mitigations. Highlighting these examples are for illustrative purposes only.