Loading…

An analysis of unconscious gender bias in academic texts by means of a decision algorithm

Inclusive language focuses on using the vocabulary to avoid exclusion or discrimination, specially referred to gender. The task of finding gender bias in written documents must be performed manually, and it is a time-consuming process. Consequently, studying the usage of non-inclusive language on a...

Full description

Saved in:
Bibliographic Details
Published in:PloS one 2021-09, Vol.16 (9), p.e0257903-e0257903
Main Authors: Orgeira-Crespo, Pedro, Míguez-Álvarez, Carla, Cuevas-Alonso, Miguel, Rivo-López, Elena
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c669t-7a5fc6b48ca6540d4ebb4dcd9ff4c9f4a7fd4d20a0c166791b6a6f217fddc8bc3
cites cdi_FETCH-LOGICAL-c669t-7a5fc6b48ca6540d4ebb4dcd9ff4c9f4a7fd4d20a0c166791b6a6f217fddc8bc3
container_end_page e0257903
container_issue 9
container_start_page e0257903
container_title PloS one
container_volume 16
creator Orgeira-Crespo, Pedro
Míguez-Álvarez, Carla
Cuevas-Alonso, Miguel
Rivo-López, Elena
description Inclusive language focuses on using the vocabulary to avoid exclusion or discrimination, specially referred to gender. The task of finding gender bias in written documents must be performed manually, and it is a time-consuming process. Consequently, studying the usage of non-inclusive language on a document, and the impact of different document properties (such as author gender, date of presentation, etc.) on how many non-inclusive instances are found, is quite difficult or even impossible for big datasets. This research analyzes the gender bias in academic texts by analyzing a study corpus of more than 12,000 million words obtained from more than one hundred thousand doctoral theses from Spanish universities. For this purpose, an automated algorithm was developed to evaluate the different characteristics of the document and look for interactions between age, year of publication, gender or the field of knowledge in which the doctoral thesis is framed. The algorithm identified information patterns using a CNN (convolutional neural network) by the creation of a vector representation of the sentences. The results showed evidence that there was a greater bias as the age of the authors increased, who were more likely to use non-inclusive terms; it was concluded that there is a greater awareness of inclusiveness in women than in men, and also that this awareness grows as the candidate is younger. The results showed evidence that the age of the authors increased discrimination, with men being more likely to use non-inclusive terms (up to an index of 23.12), showing that there is a greater awareness of inclusiveness in women than in men in all age ranges (with an average of 14.99), and also that this awareness grows as the candidate is younger (falling down to 13.07). In terms of field of knowledge, the humanities are the most biased (20.97), discarding the subgroup of Linguistics, which has the least bias at all levels (9.90), and the field of science and engineering, which also have the least influence (13.46). Those results support the assumption that the bias in academic texts (doctoral theses) is due to unconscious issues: otherwise, it would not depend on the field, age, gender, and would occur in any field in the same proportion. The innovation provided by this research lies mainly in the ability to detect, within a textual document in Spanish, whether the use of language can be considered non-inclusive, based on a CNN that has been trained in the context o
doi_str_mv 10.1371/journal.pone.0257903
format article
fullrecord <record><control><sourceid>gale_plos_</sourceid><recordid>TN_cdi_plos_journals_2578155138</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A677376485</galeid><doaj_id>oai_doaj_org_article_ac933f33d8a74ac2bf1204b25abfb9ca</doaj_id><sourcerecordid>A677376485</sourcerecordid><originalsourceid>FETCH-LOGICAL-c669t-7a5fc6b48ca6540d4ebb4dcd9ff4c9f4a7fd4d20a0c166791b6a6f217fddc8bc3</originalsourceid><addsrcrecordid>eNqNk1uL1DAUx4so7kW_gWBBkPVhxtyaNi_CsHgZWFjwBj6Fk1snQ9uMTSvOtzcz05Wt7INPCSe_80_yP-dk2QuMlpiW-O02jH0HzXIXOrtEpCgFoo-ycywoWXCC6ON7-7PsIsYtQgWtOH-anVFWCCwIPc9-rLocksw--pgHl4-dDl3UPowxr21nbJ8rDzH3CdNgbOt1PtjfQ8zVPm8tdMcsyI3VPvqQqKYOvR827bPsiYMm2ufTepl9-_D-6_Wnxc3tx_X16mahORfDooTCaa5YpYEXDBlmlWJGG-Ec08IxKJ1hhiBAGnNeCqw4cEdwChtdKU0vs5cn3V0TopxciTIZUuGiwLRKxPpEmABbuet9C_1eBvDyGAh9LaEfvG6sBC0odZSaCkoGmiiHCWKKFKCcEhqS1rvptlG11mjbDT00M9H5Sec3sg6_ZMUqSoRIAleTQB9-jjYOsvVR26aBzibTj-8uC8FIkdBX_6AP_26iakgf8J0L6V59EJUrXpa05Kw6aC0foO5KmlrI-RSfJbyZJSTmUPcaxhjl-svn_2dvv8_Z1_fYjYVm2MTQjENqnjgH2QnUfYixt-6vyRjJwwTcuSEPEyCnCaB_ABXS-KQ</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2578155138</pqid></control><display><type>article</type><title>An analysis of unconscious gender bias in academic texts by means of a decision algorithm</title><source>Publicly Available Content (ProQuest)</source><source>PubMed Central</source><creator>Orgeira-Crespo, Pedro ; Míguez-Álvarez, Carla ; Cuevas-Alonso, Miguel ; Rivo-López, Elena</creator><contributor>Zhang, Jie</contributor><creatorcontrib>Orgeira-Crespo, Pedro ; Míguez-Álvarez, Carla ; Cuevas-Alonso, Miguel ; Rivo-López, Elena ; Zhang, Jie</creatorcontrib><description>Inclusive language focuses on using the vocabulary to avoid exclusion or discrimination, specially referred to gender. The task of finding gender bias in written documents must be performed manually, and it is a time-consuming process. Consequently, studying the usage of non-inclusive language on a document, and the impact of different document properties (such as author gender, date of presentation, etc.) on how many non-inclusive instances are found, is quite difficult or even impossible for big datasets. This research analyzes the gender bias in academic texts by analyzing a study corpus of more than 12,000 million words obtained from more than one hundred thousand doctoral theses from Spanish universities. For this purpose, an automated algorithm was developed to evaluate the different characteristics of the document and look for interactions between age, year of publication, gender or the field of knowledge in which the doctoral thesis is framed. The algorithm identified information patterns using a CNN (convolutional neural network) by the creation of a vector representation of the sentences. The results showed evidence that there was a greater bias as the age of the authors increased, who were more likely to use non-inclusive terms; it was concluded that there is a greater awareness of inclusiveness in women than in men, and also that this awareness grows as the candidate is younger. The results showed evidence that the age of the authors increased discrimination, with men being more likely to use non-inclusive terms (up to an index of 23.12), showing that there is a greater awareness of inclusiveness in women than in men in all age ranges (with an average of 14.99), and also that this awareness grows as the candidate is younger (falling down to 13.07). In terms of field of knowledge, the humanities are the most biased (20.97), discarding the subgroup of Linguistics, which has the least bias at all levels (9.90), and the field of science and engineering, which also have the least influence (13.46). Those results support the assumption that the bias in academic texts (doctoral theses) is due to unconscious issues: otherwise, it would not depend on the field, age, gender, and would occur in any field in the same proportion. The innovation provided by this research lies mainly in the ability to detect, within a textual document in Spanish, whether the use of language can be considered non-inclusive, based on a CNN that has been trained in the context of the doctoral thesis. A significant number of documents have been used, using all accessible doctoral theses from Spanish universities of the last 40 years; this dataset is only manageable by data mining systems, so that the training allows identifying the terms within the context effectively and compiling them in a novel dictionary of non-inclusive terms.</description><identifier>ISSN: 1932-6203</identifier><identifier>EISSN: 1932-6203</identifier><identifier>DOI: 10.1371/journal.pone.0257903</identifier><identifier>PMID: 34591923</identifier><language>eng</language><publisher>San Francisco: Public Library of Science</publisher><subject>Age ; Algorithms ; Analysis ; Artificial intelligence ; Artificial neural networks ; Bias ; Biology and Life Sciences ; Computer and Information Sciences ; Context ; Data mining ; Datasets ; Demographic aspects ; Discrimination ; Dissertations &amp; theses ; Educational aspects ; Gender ; Gender equality ; Hate speech ; Human bias ; Language ; Linguistic research ; Linguistics ; Men ; Neural networks ; Physical Sciences ; Research and Analysis Methods ; Sentences ; Sex discrimination ; Social Sciences ; Subgroups ; Texts ; Theses</subject><ispartof>PloS one, 2021-09, Vol.16 (9), p.e0257903-e0257903</ispartof><rights>COPYRIGHT 2021 Public Library of Science</rights><rights>2021 Orgeira-Crespo et al. This is an open access article distributed under the terms of the Creative Commons Attribution License: http://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>2021 Orgeira-Crespo et al 2021 Orgeira-Crespo et al</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c669t-7a5fc6b48ca6540d4ebb4dcd9ff4c9f4a7fd4d20a0c166791b6a6f217fddc8bc3</citedby><cites>FETCH-LOGICAL-c669t-7a5fc6b48ca6540d4ebb4dcd9ff4c9f4a7fd4d20a0c166791b6a6f217fddc8bc3</cites><orcidid>0000-0003-2734-4586</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.proquest.com/docview/2578155138/fulltextPDF?pq-origsite=primo$$EPDF$$P50$$Gproquest$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.proquest.com/docview/2578155138?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>230,314,727,780,784,885,25753,27924,27925,37012,37013,44590,53791,53793,74998</link.rule.ids></links><search><contributor>Zhang, Jie</contributor><creatorcontrib>Orgeira-Crespo, Pedro</creatorcontrib><creatorcontrib>Míguez-Álvarez, Carla</creatorcontrib><creatorcontrib>Cuevas-Alonso, Miguel</creatorcontrib><creatorcontrib>Rivo-López, Elena</creatorcontrib><title>An analysis of unconscious gender bias in academic texts by means of a decision algorithm</title><title>PloS one</title><description>Inclusive language focuses on using the vocabulary to avoid exclusion or discrimination, specially referred to gender. The task of finding gender bias in written documents must be performed manually, and it is a time-consuming process. Consequently, studying the usage of non-inclusive language on a document, and the impact of different document properties (such as author gender, date of presentation, etc.) on how many non-inclusive instances are found, is quite difficult or even impossible for big datasets. This research analyzes the gender bias in academic texts by analyzing a study corpus of more than 12,000 million words obtained from more than one hundred thousand doctoral theses from Spanish universities. For this purpose, an automated algorithm was developed to evaluate the different characteristics of the document and look for interactions between age, year of publication, gender or the field of knowledge in which the doctoral thesis is framed. The algorithm identified information patterns using a CNN (convolutional neural network) by the creation of a vector representation of the sentences. The results showed evidence that there was a greater bias as the age of the authors increased, who were more likely to use non-inclusive terms; it was concluded that there is a greater awareness of inclusiveness in women than in men, and also that this awareness grows as the candidate is younger. The results showed evidence that the age of the authors increased discrimination, with men being more likely to use non-inclusive terms (up to an index of 23.12), showing that there is a greater awareness of inclusiveness in women than in men in all age ranges (with an average of 14.99), and also that this awareness grows as the candidate is younger (falling down to 13.07). In terms of field of knowledge, the humanities are the most biased (20.97), discarding the subgroup of Linguistics, which has the least bias at all levels (9.90), and the field of science and engineering, which also have the least influence (13.46). Those results support the assumption that the bias in academic texts (doctoral theses) is due to unconscious issues: otherwise, it would not depend on the field, age, gender, and would occur in any field in the same proportion. The innovation provided by this research lies mainly in the ability to detect, within a textual document in Spanish, whether the use of language can be considered non-inclusive, based on a CNN that has been trained in the context of the doctoral thesis. A significant number of documents have been used, using all accessible doctoral theses from Spanish universities of the last 40 years; this dataset is only manageable by data mining systems, so that the training allows identifying the terms within the context effectively and compiling them in a novel dictionary of non-inclusive terms.</description><subject>Age</subject><subject>Algorithms</subject><subject>Analysis</subject><subject>Artificial intelligence</subject><subject>Artificial neural networks</subject><subject>Bias</subject><subject>Biology and Life Sciences</subject><subject>Computer and Information Sciences</subject><subject>Context</subject><subject>Data mining</subject><subject>Datasets</subject><subject>Demographic aspects</subject><subject>Discrimination</subject><subject>Dissertations &amp; theses</subject><subject>Educational aspects</subject><subject>Gender</subject><subject>Gender equality</subject><subject>Hate speech</subject><subject>Human bias</subject><subject>Language</subject><subject>Linguistic research</subject><subject>Linguistics</subject><subject>Men</subject><subject>Neural networks</subject><subject>Physical Sciences</subject><subject>Research and Analysis Methods</subject><subject>Sentences</subject><subject>Sex discrimination</subject><subject>Social Sciences</subject><subject>Subgroups</subject><subject>Texts</subject><subject>Theses</subject><issn>1932-6203</issn><issn>1932-6203</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><sourceid>DOA</sourceid><recordid>eNqNk1uL1DAUx4so7kW_gWBBkPVhxtyaNi_CsHgZWFjwBj6Fk1snQ9uMTSvOtzcz05Wt7INPCSe_80_yP-dk2QuMlpiW-O02jH0HzXIXOrtEpCgFoo-ycywoWXCC6ON7-7PsIsYtQgWtOH-anVFWCCwIPc9-rLocksw--pgHl4-dDl3UPowxr21nbJ8rDzH3CdNgbOt1PtjfQ8zVPm8tdMcsyI3VPvqQqKYOvR827bPsiYMm2ufTepl9-_D-6_Wnxc3tx_X16mahORfDooTCaa5YpYEXDBlmlWJGG-Ec08IxKJ1hhiBAGnNeCqw4cEdwChtdKU0vs5cn3V0TopxciTIZUuGiwLRKxPpEmABbuet9C_1eBvDyGAh9LaEfvG6sBC0odZSaCkoGmiiHCWKKFKCcEhqS1rvptlG11mjbDT00M9H5Sec3sg6_ZMUqSoRIAleTQB9-jjYOsvVR26aBzibTj-8uC8FIkdBX_6AP_26iakgf8J0L6V59EJUrXpa05Kw6aC0foO5KmlrI-RSfJbyZJSTmUPcaxhjl-svn_2dvv8_Z1_fYjYVm2MTQjENqnjgH2QnUfYixt-6vyRjJwwTcuSEPEyCnCaB_ABXS-KQ</recordid><startdate>20210930</startdate><enddate>20210930</enddate><creator>Orgeira-Crespo, Pedro</creator><creator>Míguez-Álvarez, Carla</creator><creator>Cuevas-Alonso, Miguel</creator><creator>Rivo-López, Elena</creator><general>Public Library of Science</general><general>Public Library of Science (PLoS)</general><scope>AAYXX</scope><scope>CITATION</scope><scope>IOV</scope><scope>ISR</scope><scope>3V.</scope><scope>7QG</scope><scope>7QL</scope><scope>7QO</scope><scope>7RV</scope><scope>7SN</scope><scope>7SS</scope><scope>7T5</scope><scope>7TG</scope><scope>7TM</scope><scope>7U9</scope><scope>7X2</scope><scope>7X7</scope><scope>7XB</scope><scope>88E</scope><scope>8AO</scope><scope>8C1</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FH</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AEUYN</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>ATCPS</scope><scope>AZQEC</scope><scope>BBNVY</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>BHPHI</scope><scope>C1K</scope><scope>CCPQU</scope><scope>D1I</scope><scope>DWQXO</scope><scope>FR3</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>H94</scope><scope>HCIFZ</scope><scope>K9.</scope><scope>KB.</scope><scope>KB0</scope><scope>KL.</scope><scope>L6V</scope><scope>LK8</scope><scope>M0K</scope><scope>M0S</scope><scope>M1P</scope><scope>M7N</scope><scope>M7P</scope><scope>M7S</scope><scope>NAPCQ</scope><scope>P5Z</scope><scope>P62</scope><scope>P64</scope><scope>PATMY</scope><scope>PDBOC</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope><scope>PYCSY</scope><scope>RC3</scope><scope>7X8</scope><scope>5PM</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0003-2734-4586</orcidid></search><sort><creationdate>20210930</creationdate><title>An analysis of unconscious gender bias in academic texts by means of a decision algorithm</title><author>Orgeira-Crespo, Pedro ; Míguez-Álvarez, Carla ; Cuevas-Alonso, Miguel ; Rivo-López, Elena</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c669t-7a5fc6b48ca6540d4ebb4dcd9ff4c9f4a7fd4d20a0c166791b6a6f217fddc8bc3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Age</topic><topic>Algorithms</topic><topic>Analysis</topic><topic>Artificial intelligence</topic><topic>Artificial neural networks</topic><topic>Bias</topic><topic>Biology and Life Sciences</topic><topic>Computer and Information Sciences</topic><topic>Context</topic><topic>Data mining</topic><topic>Datasets</topic><topic>Demographic aspects</topic><topic>Discrimination</topic><topic>Dissertations &amp; theses</topic><topic>Educational aspects</topic><topic>Gender</topic><topic>Gender equality</topic><topic>Hate speech</topic><topic>Human bias</topic><topic>Language</topic><topic>Linguistic research</topic><topic>Linguistics</topic><topic>Men</topic><topic>Neural networks</topic><topic>Physical Sciences</topic><topic>Research and Analysis Methods</topic><topic>Sentences</topic><topic>Sex discrimination</topic><topic>Social Sciences</topic><topic>Subgroups</topic><topic>Texts</topic><topic>Theses</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Orgeira-Crespo, Pedro</creatorcontrib><creatorcontrib>Míguez-Álvarez, Carla</creatorcontrib><creatorcontrib>Cuevas-Alonso, Miguel</creatorcontrib><creatorcontrib>Rivo-López, Elena</creatorcontrib><collection>CrossRef</collection><collection>Opposing Viewpoints In Context</collection><collection>Gale In Context: Science</collection><collection>ProQuest Central (Corporate)</collection><collection>Animal Behavior Abstracts</collection><collection>Bacteriology Abstracts (Microbiology B)</collection><collection>Biotechnology Research Abstracts</collection><collection>Proquest Nursing &amp; Allied Health Source</collection><collection>Ecology Abstracts</collection><collection>Entomology Abstracts (Full archive)</collection><collection>Immunology Abstracts</collection><collection>Meteorological &amp; Geoastrophysical Abstracts</collection><collection>Nucleic Acids Abstracts</collection><collection>Virology and AIDS Abstracts</collection><collection>Agricultural Science Collection</collection><collection>ProQuest Health &amp; Medical Collection</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Medical Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>ProQuest Public Health Database</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest One Sustainability</collection><collection>ProQuest Central</collection><collection>Advanced Technologies &amp; Aerospace Collection</collection><collection>Agricultural &amp; Environmental Science Collection</collection><collection>ProQuest Central Essentials</collection><collection>Biological Science Collection</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>Environmental Sciences and Pollution Management</collection><collection>ProQuest One Community College</collection><collection>ProQuest Materials Science Collection</collection><collection>ProQuest Central</collection><collection>Engineering Research Database</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>AIDS and Cancer Research Abstracts</collection><collection>SciTech Premium Collection (Proquest) (PQ_SDU_P3)</collection><collection>ProQuest Health &amp; Medical Complete (Alumni)</collection><collection>https://resources.nclive.org/materials</collection><collection>Nursing &amp; Allied Health Database (Alumni Edition)</collection><collection>Meteorological &amp; Geoastrophysical Abstracts - Academic</collection><collection>ProQuest Engineering Collection</collection><collection>Biological Sciences</collection><collection>Agriculture Science Database</collection><collection>Health &amp; Medical Collection (Alumni Edition)</collection><collection>Medical Database</collection><collection>Algology Mycology and Protozoology Abstracts (Microbiology C)</collection><collection>Biological Science Database</collection><collection>Engineering Database</collection><collection>Nursing &amp; Allied Health Premium</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Database</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Collection</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Environmental Science Database</collection><collection>Materials Science Collection</collection><collection>Publicly Available Content (ProQuest)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection><collection>Environmental Science Collection</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><collection>Directory of Open Access Journals</collection><jtitle>PloS one</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Orgeira-Crespo, Pedro</au><au>Míguez-Álvarez, Carla</au><au>Cuevas-Alonso, Miguel</au><au>Rivo-López, Elena</au><au>Zhang, Jie</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>An analysis of unconscious gender bias in academic texts by means of a decision algorithm</atitle><jtitle>PloS one</jtitle><date>2021-09-30</date><risdate>2021</risdate><volume>16</volume><issue>9</issue><spage>e0257903</spage><epage>e0257903</epage><pages>e0257903-e0257903</pages><issn>1932-6203</issn><eissn>1932-6203</eissn><abstract>Inclusive language focuses on using the vocabulary to avoid exclusion or discrimination, specially referred to gender. The task of finding gender bias in written documents must be performed manually, and it is a time-consuming process. Consequently, studying the usage of non-inclusive language on a document, and the impact of different document properties (such as author gender, date of presentation, etc.) on how many non-inclusive instances are found, is quite difficult or even impossible for big datasets. This research analyzes the gender bias in academic texts by analyzing a study corpus of more than 12,000 million words obtained from more than one hundred thousand doctoral theses from Spanish universities. For this purpose, an automated algorithm was developed to evaluate the different characteristics of the document and look for interactions between age, year of publication, gender or the field of knowledge in which the doctoral thesis is framed. The algorithm identified information patterns using a CNN (convolutional neural network) by the creation of a vector representation of the sentences. The results showed evidence that there was a greater bias as the age of the authors increased, who were more likely to use non-inclusive terms; it was concluded that there is a greater awareness of inclusiveness in women than in men, and also that this awareness grows as the candidate is younger. The results showed evidence that the age of the authors increased discrimination, with men being more likely to use non-inclusive terms (up to an index of 23.12), showing that there is a greater awareness of inclusiveness in women than in men in all age ranges (with an average of 14.99), and also that this awareness grows as the candidate is younger (falling down to 13.07). In terms of field of knowledge, the humanities are the most biased (20.97), discarding the subgroup of Linguistics, which has the least bias at all levels (9.90), and the field of science and engineering, which also have the least influence (13.46). Those results support the assumption that the bias in academic texts (doctoral theses) is due to unconscious issues: otherwise, it would not depend on the field, age, gender, and would occur in any field in the same proportion. The innovation provided by this research lies mainly in the ability to detect, within a textual document in Spanish, whether the use of language can be considered non-inclusive, based on a CNN that has been trained in the context of the doctoral thesis. A significant number of documents have been used, using all accessible doctoral theses from Spanish universities of the last 40 years; this dataset is only manageable by data mining systems, so that the training allows identifying the terms within the context effectively and compiling them in a novel dictionary of non-inclusive terms.</abstract><cop>San Francisco</cop><pub>Public Library of Science</pub><pmid>34591923</pmid><doi>10.1371/journal.pone.0257903</doi><tpages>e0257903</tpages><orcidid>https://orcid.org/0000-0003-2734-4586</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1932-6203
ispartof PloS one, 2021-09, Vol.16 (9), p.e0257903-e0257903
issn 1932-6203
1932-6203
language eng
recordid cdi_plos_journals_2578155138
source Publicly Available Content (ProQuest); PubMed Central
subjects Age
Algorithms
Analysis
Artificial intelligence
Artificial neural networks
Bias
Biology and Life Sciences
Computer and Information Sciences
Context
Data mining
Datasets
Demographic aspects
Discrimination
Dissertations & theses
Educational aspects
Gender
Gender equality
Hate speech
Human bias
Language
Linguistic research
Linguistics
Men
Neural networks
Physical Sciences
Research and Analysis Methods
Sentences
Sex discrimination
Social Sciences
Subgroups
Texts
Theses
title An analysis of unconscious gender bias in academic texts by means of a decision algorithm
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-07T17%3A57%3A27IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_plos_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=An%20analysis%20of%20unconscious%20gender%20bias%20in%20academic%20texts%20by%20means%20of%20a%20decision%20algorithm&rft.jtitle=PloS%20one&rft.au=Orgeira-Crespo,%20Pedro&rft.date=2021-09-30&rft.volume=16&rft.issue=9&rft.spage=e0257903&rft.epage=e0257903&rft.pages=e0257903-e0257903&rft.issn=1932-6203&rft.eissn=1932-6203&rft_id=info:doi/10.1371/journal.pone.0257903&rft_dat=%3Cgale_plos_%3EA677376485%3C/gale_plos_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c669t-7a5fc6b48ca6540d4ebb4dcd9ff4c9f4a7fd4d20a0c166791b6a6f217fddc8bc3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2578155138&rft_id=info:pmid/34591923&rft_galeid=A677376485&rfr_iscdi=true