Improper data curation risk for AI

Value alignment
Training data risks
Amplified by generative AI
Amplified by synthetic data

Description

Improper collection, generation, and preparation of training or tuning data can result in data label errors, conflicting information or misinformation.

Why is improper data curation a concern for foundation models?

Improper data curation, including errors in synthetic data generation, can adversely affect how a model is trained, resulting in a model that does not behave in accordance with the intended values. Correcting problems after the model is trained and deployed might be insufficient for guaranteeing proper behavior.

Parent topic: AI risk atlas

We provide examples covered by the press to help explain many of the foundation models' risks. Many of these events covered by the press are either still evolving or have been resolved, and referencing them can help the reader understand the potential risks and work toward mitigations. Highlighting these examples are for illustrative purposes only.