The bias seen in images generated by artificial intelligence is pervasive and difficult to address.
It’s now well documented that artificial intelligence systems are subject to inherent biases in the results they deliver. AI-driven healthcare recommendation engines have been shown to discriminate against members of minority groups, and HR recruiting systems have been shown to be biased against female candidates.
Now, a recent study shows that sexism against women in artificial intelligence-based image processing runs deeper than thought. The study, conducted and published by Ryan Steed of Carnegie Mellon University and Aylin Caliskan of George Washington University, finds that unsupervised data models trained on ImageNet, a popular benchmark image dataset curated from internet images, “automatically learn racial, gender, and intersectional biases. machine learning models can automatically learn bias from the way people are stereotypically portrayed on the web.”
ImageNet is an image database organized according to the WordNet hierarchy (currently only the nouns), in which each node of the hierarchy is depicted by hundreds and thousands of images.
These pre-trained models “may embed all types of harmful human biases from the way people are portrayed in training data, and model design choices determine whether and how those biases are propagated into harms downstream,” Steed and Caliskan point out. Their findings show a “bias towards the sexualization of women.”
The researchers found a majority of AI-generated female images, 53%, featured a bikini or low-cut top; compared to seven percent of male images with shirtless or low-cut tops, while 43% wore suits or other
career-specific attire.
“This behavior might result from the sexualized portrayal of people, especially women, in internet images, and serves as a reminder of computer vision’s controversial history with Playboy centerfolds and objectifying images,” Steed and Caliskan say.
The bias seen in AI-generated images is a difficult problem to solve, as recently explored in by Todd Feathers in Vice, looked at the study, noting that the problem is pervasive and difficult to address. Datasets such as ImageNet “contain a multitude of racist, pornographic, and otherwise-problematic images. And they are continuously being updated with new images from the web without the subjects’ consent or knowledge and no avenue for recourse.”
The Steed and Caliskan paper “shows just how ingrained it can be in an area like computer vision that, through tools like facial recognition and gun detection, can have life-and-death ramifications,” Feathers explains.
“Steed and Caliskan demonstrated that the bias in unsupervised systems runs even deeper and will persist even if humans haven’t instilled additional prejudices through the labeling process—the models will simply learn it from the images themselves.”