*Originally published January 29, 2019; updated February 15, 2019, to reflect important contributions from Joy Buolamwini and Timnit Gebru in Gender Shades (2018) cited in the Diversity in Faces arXiv paper.
Have you ever been treated unfairly? How did it make you feel? Probably not too good. Most people generally agree that a fairer world is a better world, and our AI researchers couldn’t agree more. That’s why we are harnessing the power of science to create AI systems that are more fair and accurate.
Many of our recent advances in AI have produced remarkable capabilities for computers to accomplish increasingly sophisticated and important tasks, like translating speech across languages to bridge communications across cultures, improving complex interactions between people and machines, and automatically recognizing contents of video to assist in safety applications.
Much of the power of AI today comes from the use of data-driven deep learning to train increasingly accurate models by using growing amounts of data. However, the strength of these techniques can also be a weakness. The AI systems learn what they’re taught, and if they are not taught with robust and diverse datasets, accuracy and fairness could be at risk. For that reason, IBM, along with AI developers and the research community, need to be thoughtful about what data we use for training. IBM remains committed to developing AI systems to make the world more fair.
The challenge in training AI is manifested in a very apparent and profound way with facial recognition technology. There can be difficulties in making facial recognition systems that meet fairness expectations. As shown by Joy Buolamwini and Timnit Gebru in Gender Shades in 2018, facial recognition systems in commercial use performed better for lighter individuals and males and worse for darker females . The heart of the problem is not with the AI technology itself, per se, but with how the AI-powered facial recognition systems are trained. For the facial recognition systems to perform as desired – and the outcomes to become increasingly accurate – training data must be diverse and offer a breadth of coverage, as shown in our prior work . For example, the training data sets must be large enough and different enough that the technology learns all the ways in which faces differ to accurately recognize those differences in a variety of situations. The images must reflect the distribution of features in faces we see in the world.
How do we measure and ensure diversity for human faces? On one hand, we are familiar with how faces differ by age, gender, and skin tone, and how different faces can vary across some of these dimensions. Much of the focus on facial recognition technology has been on how well it performs within these attributes. But, as prior studies have shown, these attributes are just a piece of the puzzle and not entirely adequate for characterizing the full diversity of human faces. Dimensions like face symmetry, facial contrast, the pose the face is in, the length or width of the face’s attributes (eyes, nose, forehead, etc.) are also important.
Today, IBM Research is releasing a new large and diverse dataset called Diversity in Faces (DiF) to advance the study of fairness and accuracy in facial recognition technology. The first of its kind available to the global research community, DiF provides a dataset of annotations of 1 million human facial images. Using publicly available images from the YFCC-100M Creative Commons data set, we annotated the faces using 10 well-established and independent coding schemes from the scientific literature [3-12]. The coding schemes principally include objective measures of human faces, such as craniofacial features, as well as more subjective annotations, such as human-labeled predictions of age and gender. We believe by extracting and releasing these facial coding scheme annotations on a large dataset of 1 million images of faces, we will accelerate the study of diversity and coverage of data for AI facial recognition systems to ensure more fair and accurate AI systems. Today’s release is simply the first step.
We believe the DiF dataset and its 10 coding schemes offer a jumping-off point for researchers around the globe studying the facial recognition technology. The 10 facial coding methods include craniofacial (e.g., head length, nose length, forehead height), facial ratios (symmetry), visual attributes (age, gender), and pose and resolution, among others. These schemes are some of the strongest identified by the scientific literature, building a solid foundation to our collective knowledge.
Our initial analysis has shown that the DiF dataset provides a more balanced distribution and broader coverage of facial images compared to previous datasets. Furthermore, the insights obtained from the statistical analysis of the 10 initial coding schemes on the DiF dataset has furthered our own understanding of what is important for characterizing human faces and enabled us to continue important research into ways to improve facial recognition technology.
The dataset is available today to the global research community upon request. IBM is proud to make this available and our goal is to help further our collective research and contribute to creating AI systems that are more fair.
While IBM Research is committed to continuing study and investigation of fairer facial recognition systems, we don’t believe we can do it alone. With today’s release, we urge others to contribute to the growing body of research and advance this important scientific agenda.
 J. Buolamwini & T. Gebru, “Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification,” Proc. of Machine Learning Research. 2018.
 R. Puri, “Mitigating Bias in AI Models”, February 6, 2018.
 L. G. Farkas, Anthropometry of the Head and Face, Raven Press, 1994.
 A. Chardon I. Cretois and C. Hourseau, “Skin colour typology and suntanning pathways,” International Journal of Cosmetic Science, Aug. 1991, 13(4), pp. 191-208.
 Y. Liu, K. L. Schmidt, J. F. Cohn, S. Mitra, “Facial asymmetry quantification for expression invariant human identification,” Computer Vision and Image Understanding, Volume 91, Issues 1–2, July–August 2003, pp. 138-159.
 L. G. Farkas, et. al, “International anthropometric study of facial morphology in various ethnic groups/races,” J Craniofac Surg. 2005 Jul;16(4), pp. 615-46.
 N. Ramanathan, R. Chellappa, “Modeling Age Progression in Young Faces,” Intl. Conf. on Computer Vision and Pattern Recognition (CVPR), 2006, pp. 387-394.
 A. C. Little, B. C. Jones, L. M. DeBruine, “Facial attractiveness: evolutionary based research,” Philos Trans R Soc Lond B Biol Sci. 2011 Jun 12;366(1571), pp. 1638-59.
 X. Zhu, D. Ramanan, “Face Detection, Pose Estimation, and Landmark Localization in the Wild,” Intl. Conf. on Computer Vision and Pattern Recognition (CVPR), 2012, pp. 2879-2886.
 A. Porcheron, E. Mauger, R. Russell, “Aspects of Facial Contrast Decrease with Age and Are Cues for Age Perception,” PLoS One 8(3), Mar. 6, 2013
 Z. Liu, P. Luo, X. Wang, X. Tang, “Deep Learning Face Attributes in the Wild”, Intl. Conf. on Computer Vision (ICCV), 2015, pp. 3730-3738.
 R. Rothe, R. Timofte, L. Van Gool, “Deep Expectation of Real and Apparent Age from a Single Image Without Facial Landmarks”, Intl. Journal of Computer Vision, Volume 126 Issue 2-4, April 2018, pp. 144-157.
IBM AI researchers are responsible for developing many of the NLP capabilities IBM has brought to market. With the announcement that IBM will begin integrating NLP features developed for Project Debater into Watson, IBM Research once again delivers unique technology from the lab to the enterprise.