Loading…
Word embeddings are biased. But whose bias are they reflecting?
From Curriculum Vitae parsing to web search and recommendation systems, Word2Vec and other word embedding techniques have an increasing presence in everyday interactions in human society. Biases, such as gender bias, have been thoroughly researched and evidenced to be present in word embeddings. Mos...
Saved in:
Published in: | AI & society 2023-04, Vol.38 (2), p.975-982 |
---|---|
Main Authors: | , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | From Curriculum Vitae parsing to web search and recommendation systems, Word2Vec and other word embedding techniques have an increasing presence in everyday interactions in human society. Biases, such as gender bias, have been thoroughly researched and evidenced to be present in word embeddings. Most of the research focuses on discovering and mitigating gender bias within the frames of the vector space itself. Nevertheless, whose bias is reflected in word embeddings has not yet been investigated. Besides discovering and mitigating gender bias, it is also important to examine whether a feminine or a masculine-centric view is represented in the biases of word embeddings. This way, we will not only gain more insight into the origins of the before mentioned biases, but also present a novel approach to investigating biases in Natural Language Processing systems. Based on previous research in the social sciences and gender studies, we hypothesize that masculine-centric, otherwise known as androcentric, biases are dominant in word embeddings. To test this hypothesis we used the largest English word association test data set publicly available. We compare the distance of the responses of male and female participants to cue words in a word embedding vector space. We found that the word embedding is biased towards a masculine-centric viewpoint, predominantly reflecting the worldviews of the male participants in the word association test data set. Therefore, by conducting this research, we aimed to unravel another layer of bias to be considered when examining fairness in algorithms. |
---|---|
ISSN: | 0951-5666 1435-5655 |
DOI: | 10.1007/s00146-022-01443-w |