To quote a common aphorism, all models are wrong. This holds true in the areas of statistics, science and AI. Models created with a lack of domain expertise can lead to erroneous outputs.

Today, a tiny homogeneous group of people determine what data to use to train generative AI models, which is drawn from sources that greatly overrepresent English. “For most of the over 6,000 languages in the world, the text data available is not enough to train a large-scale foundation model” (from “On the Opportunities and Risks of Foundation Models,” Bommasani et al., 2022).

Additionally, the models themselves are created from limited architectures: “Almost all state-of-the-art NLP models are now adapted from one of a few foundation models, such as BERT, RoBERTa, BART, T5, etc. While this homogenization produces extremely high leverage (any improvements in the foundation models can lead to immediate benefits across all of NLP), it is also a liability; all AI systems might inherit the same problematic biases of a few foundation models (Bommasani et al.)”

For generative AI to better reflect the diverse communities it serves, a far wider variety of human beings’ data must be represented in models.

Evaluating model accuracy goes hand-in-hand with evaluating bias. We must ask, what is the intent of the model and for whom is it optimized? Consider, for example, who benefits most from content-recommendation algorithms and search engine algorithms. Stakeholders may have widely different interests and goals. Algorithms and models require targets or proxies for Bayes error: the minimum error that a model must improve upon. This proxy is often a person, such as a subject matter expert with domain expertise.