A recent study conducted by researchers at Carnegie Mellon, the University of Amsterdam, and AI startup Hugging Face reveals significant inconsistencies in how AI models respond to sensitive topics. Presented at the 2024 ACM Fairness, Accountability and Transparency (FAccT) conference, the study highlights the cultural biases embedded in AI responses, particularly concerning issues like LGBTQ+ rights, immigration, and social welfare.
The research tested various AI models, including Meta’s Llama 3, on their handling of sensitive questions across multiple languages and cultural contexts.
These models often showed discrepancies in responses, reflecting the biases in the data used for training them. Giada Pistilli, a principal ethicist and co-author of the study, emphasized the influence of cultural and linguistic variations on the models’ outputs, pointing out the significant role of the developers’ regional and cultural perspectives in shaping these AI systems.
Inconsistencies and refusals
The study involved testing models like Mistral’s Mistral 7B, Cohere’s Command-R, Alibaba’s Qwen, Google’s Gemma, and Meta’s Llama 3. Researchers used a dataset containing varied questions and statements on topics such as LGBTQ+ rights, disability rights, and immigration, submitted in languages including English, French, Turkish, and German.
One striking finding was the high rate of “refusals” by the AI models, where they chose not to answer sensitive questions, particularly those about LGBTQ+ rights. The models demonstrated a wide range of responses, with some like Alibaba’s Qwen showing a significantly higher refusal rate, which may reflect the company’s more conservative approach to sensitive content, potentially influenced by local political pressures.
The variability in responses could also be attributed to the political environments in which the AI models are developed. For instance, Chinese AI models are subject to approvals from regulatory bodies that emphasize adherence to “core socialist values,” which could affect their ability to engage with controversial topics.
Moreover, the study suggests that the differences in AI responses could stem from the biases of the annotators involved in the training process. These annotators, who label the training data, bring their cultural and personal biases into the AI models, influencing how these models perceive and respond to the world.
The findings underscore the necessity for AI developers to conduct thorough testing that considers cultural impacts and societal values. Pistilli advocates for comprehensive social impact evaluations that extend beyond conventional statistical metrics to include qualitative and quantitative assessments of how AI models perform in real-world scenarios.