Loading…

Biases in Large Language Models: Origins, Inventory, and Discussion

In this article, we introduce and discuss the pervasive issue of bias in the large language models that are currently at the core of mainstream approaches to Natural Language Processing (NLP). We first introduce data selection bias, that is, the bias caused by the choice of texts that make up a trai...

Full description

Saved in:
Bibliographic Details
Published in:ACM journal of data and information quality 2023-06, Vol.15 (2), p.1-21, Article 10
Main Authors: Navigli, Roberto, Conia, Simone, Ross, Björn
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In this article, we introduce and discuss the pervasive issue of bias in the large language models that are currently at the core of mainstream approaches to Natural Language Processing (NLP). We first introduce data selection bias, that is, the bias caused by the choice of texts that make up a training corpus. Then, we survey the different types of social bias evidenced in the text generated by language models trained on such corpora, ranging from gender to age, from sexual orientation to ethnicity, and from religion to culture. We conclude with directions focused on measuring, reducing, and tackling the aforementioned types of bias.
ISSN:1936-1955
1936-1963
DOI:10.1145/3597307