Loading…

An analysis of unconscious gender bias in academic texts by means of a decision algorithm

Inclusive language focuses on using the vocabulary to avoid exclusion or discrimination, specially referred to gender. The task of finding gender bias in written documents must be performed manually, and it is a time-consuming process. Consequently, studying the usage of non-inclusive language on a...

Full description

Saved in:

Bibliographic Details
Published in:	PloS one 2021-09, Vol.16 (9), p.e0257903-e0257903
Main Authors:	Orgeira-Crespo, Pedro, Míguez-Álvarez, Carla, Cuevas-Alonso, Miguel, Rivo-López, Elena
Format:	Article
Language:	English
Subjects:	Age Algorithms Analysis Artificial intelligence Artificial neural networks Bias Biology and Life Sciences Computer and Information Sciences Context Data mining Datasets Demographic aspects Discrimination Dissertations & theses Educational aspects Gender Gender equality Hate speech Human bias Language Linguistic research Linguistics Men Neural networks Physical Sciences Research and Analysis Methods Sentences Sex discrimination Social Sciences Subgroups Texts Theses
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by	cdi_FETCH-LOGICAL-c669t-7a5fc6b48ca6540d4ebb4dcd9ff4c9f4a7fd4d20a0c166791b6a6f217fddc8bc3
cites	cdi_FETCH-LOGICAL-c669t-7a5fc6b48ca6540d4ebb4dcd9ff4c9f4a7fd4d20a0c166791b6a6f217fddc8bc3
container_end_page	e0257903
container_issue	9
container_start_page	e0257903
container_title	PloS one
container_volume	16
creator	Orgeira-Crespo, Pedro Míguez-Álvarez, Carla Cuevas-Alonso, Miguel Rivo-López, Elena
description	Inclusive language focuses on using the vocabulary to avoid exclusion or discrimination, specially referred to gender. The task of finding gender bias in written documents must be performed manually, and it is a time-consuming process. Consequently, studying the usage of non-inclusive language on a document, and the impact of different document properties (such as author gender, date of presentation, etc.) on how many non-inclusive instances are found, is quite difficult or even impossible for big datasets. This research analyzes the gender bias in academic texts by analyzing a study corpus of more than 12,000 million words obtained from more than one hundred thousand doctoral theses from Spanish universities. For this purpose, an automated algorithm was developed to evaluate the different characteristics of the document and look for interactions between age, year of publication, gender or the field of knowledge in which the doctoral thesis is framed. The algorithm identified information patterns using a CNN (convolutional neural network) by the creation of a vector representation of the sentences. The results showed evidence that there was a greater bias as the age of the authors increased, who were more likely to use non-inclusive terms; it was concluded that there is a greater awareness of inclusiveness in women than in men, and also that this awareness grows as the candidate is younger. The results showed evidence that the age of the authors increased discrimination, with men being more likely to use non-inclusive terms (up to an index of 23.12), showing that there is a greater awareness of inclusiveness in women than in men in all age ranges (with an average of 14.99), and also that this awareness grows as the candidate is younger (falling down to 13.07). In terms of field of knowledge, the humanities are the most biased (20.97), discarding the subgroup of Linguistics, which has the least bias at all levels (9.90), and the field of science and engineering, which also have the least influence (13.46). Those results support the assumption that the bias in academic texts (doctoral theses) is due to unconscious issues: otherwise, it would not depend on the field, age, gender, and would occur in any field in the same proportion. The innovation provided by this research lies mainly in the ability to detect, within a textual document in Spanish, whether the use of language can be considered non-inclusive, based on a CNN that has been trained in the context o
doi_str_mv	10.1371/journal.pone.0257903
format	article
fullrecord	<record><control><sourceid>gale_plos_</sourceid><recordid>TN_cdi_plos_journals_2578155138</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A677376485</galeid><doaj_id>oai_doaj_org_article_ac933f33d8a74ac2bf1204b25abfb9ca</doaj_id><sourcerecordid>A677376485</sourcerecordid><originalsourceid>FETCH-LOGICAL-c669t-7a5fc6b48ca6540d4ebb4dcd9ff4c9f4a7fd4d20a0c166791b6a6f217fddc8bc3</originalsourceid><addsrcrecordid>eNqNk1uL1DAUx4so7kW_gWBBkPVhxtyaNi_CsHgZWFjwBj6Fk1snQ9uMTSvOtzcz05Wt7INPCSe_80_yP-dk2QuMlpiW-O02jH0HzXIXOrtEpCgFoo-ycywoWXCC6ON7-7PsIsYtQgWtOH-anVFWCCwIPc9-rLocksw--pgHl4-dDl3UPowxr21nbJ8rDzH3CdNgbOt1PtjfQ8zVPm8tdMcsyI3VPvqQqKYOvR827bPsiYMm2ufTepl9-_D-6_Wnxc3tx_X16mahORfDooTCaa5YpYEXDBlmlWJGG-Ec08IxKJ1hhiBAGnNeCqw4cEdwChtdKU0vs5cn3V0TopxciTIZUuGiwLRKxPpEmABbuet9C_1eBvDyGAh9LaEfvG6sBC0odZSaCkoGmiiHCWKKFKCcEhqS1rvptlG11mjbDT00M9H5Sec3sg6_ZMUqSoRIAleTQB9-jjYOsvVR26aBzibTj-8uC8FIkdBX_6AP_26iakgf8J0L6V59EJUrXpa05Kw6aC0foO5KmlrI-RSfJbyZJSTmUPcaxhjl-svn_2dvv8_Z1_fYjYVm2MTQjENqnjgH2QnUfYixt-6vyRjJwwTcuSEPEyCnCaB_ABXS-KQ</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2578155138</pqid></control><display><type>article</type><title>An analysis of unconscious gender bias in academic texts by means of a decision algorithm</title><source>Publicly Available Content (ProQuest)</source><source>PubMed Central</source><creator>Orgeira-Crespo, Pedro ; Míguez-Álvarez, Carla ; Cuevas-Alonso, Miguel ; Rivo-López, Elena</creator><contributor>Zhang, Jie</contributor><creatorcontrib>Orgeira-Crespo, Pedro ; Míguez-Álvarez, Carla ; Cuevas-Alonso, Miguel ; Rivo-López, Elena ; Zhang, Jie</creatorcontrib><description>Inclusive language focuses on using the vocabulary to avoid exclusion or discrimination, specially referred to gender. The task of finding gender bias in written documents must be performed manually, and it is a time-consuming process. Consequently, studying the usage of non-inclusive language on a document, and the impact of different document properties (such as author gender, date of presentation, etc.) on how many non-inclusive instances are found, is quite difficult or even impossible for big datasets. This research analyzes the gender bias in academic texts by analyzing a study corpus of more than 12,000 million words obtained from more than one hundred thousand doctoral theses from Spanish universities. For this purpose, an automated algorithm was developed to evaluate the different characteristics of the document and look for interactions between age, year of publication, gender or the field of knowledge in which the doctoral thesis is framed. The algorithm identified information patterns using a CNN (convolutional neural network) by the creation of a vector representation of the sentences. The results showed evidence that there was a greater bias as the age of the authors increased, who were more likely to use non-inclusive terms; it was concluded that there is a greater awareness of inclusiveness in women than in men, and also that this awareness grows as the candidate is younger. The results showed evidence that the age of the authors increased discrimination, with men being more likely to use non-inclusive terms (up to an index of 23.12), showing that there is a greater awareness of inclusiveness in women than in men in all age ranges (with an average of 14.99), and also that this awareness grows as the candidate is younger (falling down to 13.07). In terms of field of knowledge, the humanities are the most biased (20.97), discarding the subgroup of Linguistics, which has the least bias at all levels (9.90), and the field of science and engineering, which also have the least influence (13.46). Those results support the assumption that the bias in academic texts (doctoral theses) is due to unconscious issues: otherwise, it would not depend on the field, age, gender, and would occur in any field in the same proportion. The innovation provided by this research lies mainly in the ability to detect, within a textual document in Spanish, whether the use of language can be considered non-inclusive, based on a CNN that has been trained in the context of the doctoral thesis. A significant number of documents have been used, using all accessible doctoral theses from Spanish universities of the last 40 years; this dataset is only manageable by data mining systems, so that the training allows identifying the terms within the context effectively and compiling them in a novel dictionary of non-inclusive terms.</description><identifier>ISSN: 1932-6203</identifier><identifier>EISSN: 1932-6203</identifier><identifier>DOI: 10.1371/journal.pone.0257903</identifier><identifier>PMID: 34591923</identifier><language>eng</language><publisher>San Francisco: Public Library of Science</publisher><subject>Age ; Algorithms ; Analysis ; Artificial intelligence ; Artificial neural networks ; Bias ; Biology and Life Sciences ; Computer and Information Sciences ; Context ; Data mining ; Datasets ; Demographic aspects ; Discrimination ; Dissertations & theses ; Educational aspects ; Gender ; Gender equality ; Hate speech ; Human bias ; Language ; Linguistic research ; Linguistics ; Men ; Neural networks ; Physical Sciences ; Research and Analysis Methods ; Sentences ; Sex discrimination ; Social Sciences ; Subgroups ; Texts ; Theses</subject><ispartof>PloS one, 2021-09, Vol.16 (9), p.e0257903-e0257903</ispartof><rights>COPYRIGHT 2021 Public Library of Science</rights><rights>2021 Orgeira-Crespo et al. This is an open access article distributed under the terms of the Creative Commons Attribution License: http://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>2021 Orgeira-Crespo et al 2021 Orgeira-Crespo et al</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c669t-7a5fc6b48ca6540d4ebb4dcd9ff4c9f4a7fd4d20a0c166791b6a6f217fddc8bc3</citedby><cites>FETCH-LOGICAL-c669t-7a5fc6b48ca6540d4ebb4dcd9ff4c9f4a7fd4d20a0c166791b6a6f217fddc8bc3</cites><orcidid>0000-0003-2734-4586</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.proquest.com/docview/2578155138/fulltextPDF?pq-origsite=primo$$EPDF$$P50$$Gproquest$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.proquest.com/docview/2578155138?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>230,314,727,780,784,885,25753,27924,27925,37012,37013,44590,53791,53793,74998</link.rule.ids></links><search><contributor>Zhang, Jie</contributor><creatorcontrib>Orgeira-Crespo, Pedro</creatorcontrib><creatorcontrib>Míguez-Álvarez, Carla</creatorcontrib><creatorcontrib>Cuevas-Alonso, Miguel</creatorcontrib><creatorcontrib>Rivo-López, Elena</creatorcontrib><title>An analysis of unconscious gender bias in academic texts by means of a decision algorithm</title><title>PloS one</title><description>Inclusive language focuses on using the vocabulary to avoid exclusion or discrimination, specially referred to gender. The task of finding gender bias in written documents must be performed manually, and it is a time-consuming process. Consequently, studying the usage of non-inclusive language on a document, and the impact of different document properties (such as author gender, date of presentation, etc.) on how many non-inclusive instances are found, is quite difficult or even impossible for big datasets. This research analyzes the gender bias in academic texts by analyzing a study corpus of more than 12,000 million words obtained from more than one hundred thousand doctoral theses from Spanish universities. For this purpose, an automated algorithm was developed to evaluate the different characteristics of the document and look for interactions between age, year of publication, gender or the field of knowledge in which the doctoral thesis is framed. The algorithm identified information patterns using a CNN (convolutional neural network) by the creation of a vector representation of the sentences. The results showed evidence that there was a greater bias as the age of the authors increased, who were more likely to use non-inclusive terms; it was concluded that there is a greater awareness of inclusiveness in women than in men, and also that this awareness grows as the candidate is younger. The results showed evidence that the age of the authors increased discrimination, with men being more likely to use non-inclusive terms (up to an index of 23.12), showing that there is a greater awareness of inclusiveness in women than in men in all age ranges (with an average of 14.99), and also that this awareness grows as the candidate is younger (falling down to 13.07). In terms of field of knowledge, the humanities are the most biased (20.97), discarding the subgroup of Linguistics, which has the least bias at all levels (9.90), and the field of science and engineering, which also have the least influence (13.46). Those results support the assumption that the bias in academic texts (doctoral theses) is due to unconscious issues: otherwise, it would not depend on the field, age, gender, and would occur in any field in the same proportion. The innovation provided by this research lies mainly in the ability to detect, within a textual document in Spanish, whether the use of language can be considered non-inclusive, based on a CNN that has been trained in the context of the doctoral thesis. A significant number of documents have been used, using all accessible doctoral theses from Spanish universities of the last 40 years; this dataset is only manageable by data mining systems, so that the training allows identifying the terms within the context effectively and compiling them in a novel dictionary of non-inclusive terms.</description><subject>Age</subject><subject>Algorithms</subject><subject>Analysis</subject><subject>Artificial intelligence</subject><subject>Artificial neural networks</subject><subject>Bias</subject><subject>Biology and Life Sciences</subject><subject>Computer and Information Sciences</subject><subject>Context</subject><subject>Data mining</subject><subject>Datasets</subject><subject>Demographic aspects</subject><subject>Discrimination</subject><subject>Dissertations & theses</subject><subject>Educational aspects</subject><subject>Gender</subject><subject>Gender equality</subject><subject>Hate speech</subject><subject>Human bias</subject><subject>Language</subject><subject>Linguistic research</subject><subject>Linguistics</subject><subject>Men</subject><subject>Neural networks</subject><subject>Physical Sciences</subject><subject>Research and Analysis Methods</subject><subject>Sentences</subject><subject>Sex discrimination</subject><subject>Social Sciences</subject><subject>Subgroups</subject><subject>Texts</subject><subject>Theses</subject><issn>1932-6203</issn><issn>1932-6203</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><sourceid>DOA</sourceid><recordid>eNqNk1uL1DAUx4so7kW_gWBBkPVhxtyaNi_CsHgZWFjwBj6Fk1snQ9uMTSvOtzcz05Wt7INPCSe_80_yP-dk2QuMlpiW-O02jH0HzXIXOrtEpCgFoo-ycywoWXCC6ON7-7PsIsYtQgWtOH-anVFWCCwIPc9-rLocksw--pgHl4-dDl3UPowxr21nbJ8rDzH3CdNgbOt1PtjfQ8zVPm8tdMcsyI3VPvqQqKYOvR827bPsiYMm2ufTepl9-_D-6_Wnxc3tx_X16mahORfDooTCaa5YpYEXDBlmlWJGG-Ec08IxKJ1hhiBAGnNeCqw4cEdwChtdKU0vs5cn3V0TopxciTIZUuGiwLRKxPpEmABbuet9C_1eBvDyGAh9LaEfvG6sBC0odZSaCkoGmiiHCWKKFKCcEhqS1rvptlG11mjbDT00M9H5Sec3sg6_ZMUqSoRIAleTQB9-jjYOsvVR26aBzibTj-8uC8FIkdBX_6AP_26iakgf8J0L6V59EJUrXpa05Kw6aC0foO5KmlrI-RSfJbyZJSTmUPcaxhjl-svn_2dvv8_Z1_fYjYVm2MTQjENqnjgH2QnUfYixt-6vyRjJwwTcuSEPEyCnCaB_ABXS-KQ</recordid><startdate>20210930</startdate><enddate>20210930</enddate><creator>Orgeira-Crespo, Pedro</creator><creator>Míguez-Álvarez, Carla</creator><creator>Cuevas-Alonso, Miguel</creator><creator>Rivo-López, Elena</creator><general>Public Library of Science</general><general>Public Library of Science (PLoS)</general><scope>AAYXX</scope><scope>CITATION</scope><scope>IOV</scope><scope>ISR</scope><scope>3V.</scope><scope>7QG</scope><scope>7QL</scope><scope>7QO</scope><scope>7RV</scope><scope>7SN</scope><scope>7SS</scope><scope>7T5</scope><scope>7TG</scope><scope>7TM</scope><scope>7U9</scope><scope>7X2</scope><scope>7X7</scope><scope>7XB</scope><scope>88E</scope><scope>8AO</scope><scope>8C1</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FH</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AEUYN</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>ATCPS</scope><scope>AZQEC</scope><scope>BBNVY</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>BHPHI</scope><scope>C1K</scope><scope>CCPQU</scope><scope>D1I</scope><scope>DWQXO</scope><scope>FR3</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>H94</scope><scope>HCIFZ</scope><scope>K9.</scope><scope>KB.</scope><scope>KB0</scope><scope>KL.</scope><scope>L6V</scope><scope>LK8</scope><scope>M0K</scope><scope>M0S</scope><scope>M1P</scope><scope>M7N</scope><scope>M7P</scope><scope>M7S</scope><scope>NAPCQ</scope><scope>P5Z</scope><scope>P62</scope><scope>P64</scope><scope>PATMY</scope><scope>PDBOC</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope><scope>PYCSY</scope><scope>RC3</scope><scope>7X8</scope><scope>5PM</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0003-2734-4586</orcidid></search><sort><creationdate>20210930</creationdate><title>An analysis of unconscious gender bias in academic texts by means of a decision algorithm</title><author>Orgeira-Crespo, Pedro ; Míguez-Álvarez, Carla ; Cuevas-Alonso, Miguel ; Rivo-López, Elena</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c669t-7a5fc6b48ca6540d4ebb4dcd9ff4c9f4a7fd4d20a0c166791b6a6f217fddc8bc3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Age</topic><topic>Algorithms</topic><topic>Analysis</topic><topic>Artificial intelligence</topic><topic>Artificial neural networks</topic><topic>Bias</topic><topic>Biology and Life Sciences</topic><topic>Computer and Information Sciences</topic><topic>Context</topic><topic>Data mining</topic><topic>Datasets</topic><topic>Demographic aspects</topic><topic>Discrimination</topic><topic>Dissertations & theses</topic><topic>Educational aspects</topic><topic>Gender</topic><topic>Gender equality</topic><topic>Hate speech</topic><topic>Human bias</topic><topic>Language</topic><topic>Linguistic research</topic><topic>Linguistics</topic><topic>Men</topic><topic>Neural networks</topic><topic>Physical Sciences</topic><topic>Research and Analysis Methods</topic><topic>Sentences</topic><topic>Sex discrimination</topic><topic>Social Sciences</topic><topic>Subgroups</topic><topic>Texts</topic><topic>Theses</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Orgeira-Crespo, Pedro</creatorcontrib><creatorcontrib>Míguez-Álvarez, Carla</creatorcontrib><creatorcontrib>Cuevas-Alonso, Miguel</creatorcontrib><creatorcontrib>Rivo-López, Elena</creatorcontrib><collection>CrossRef</collection><collection>Opposing Viewpoints In Context</collection><collection>Gale In Context: Science</collection><collection>ProQuest Central (Corporate)</collection><collection>Animal Behavior Abstracts</collection><collection>Bacteriology Abstracts (Microbiology B)</collection><collection>Biotechnology Research Abstracts</collection><collection>Proquest Nursing & Allied Health Source</collection><collection>Ecology Abstracts</collection><collection>Entomology Abstracts (Full archive)</collection><collection>Immunology Abstracts</collection><collection>Meteorological & Geoastrophysical Abstracts</collection><collection>Nucleic Acids Abstracts</collection><collection>Virology and AIDS Abstracts</collection><collection>Agricultural Science Collection</collection><collection>ProQuest Health & Medical Collection</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Medical Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>ProQuest Public Health Database</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest One Sustainability</collection><collection>ProQuest Central</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>Agricultural & Environmental Science Collection</collection><collection>ProQuest Central Essentials</collection><collection>Biological Science Collection</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>Environmental Sciences and Pollution Management</collection><collection>ProQuest One Community College</collection><collection>ProQuest Materials Science Collection</collection><collection>ProQuest Central</collection><collection>Engineering Research Database</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>AIDS and Cancer Research Abstracts</collection><collection>SciTech Premium Collection (Proquest) (PQ_SDU_P3)</collection><collection>ProQuest Health & Medical Complete (Alumni)</collection><collection>https://resources.nclive.org/materials</collection><collection>Nursing & Allied Health Database (Alumni Edition)</collection><collection>Meteorological & Geoastrophysical Abstracts - Academic</collection><collection>ProQuest Engineering Collection</collection><collection>Biological Sciences</collection><collection>Agriculture Science Database</collection><collection>Health & Medical Collection (Alumni Edition)</collection><collection>Medical Database</collection><collection>Algology Mycology and Protozoology Abstracts (Microbiology C)</collection><collection>Biological Science Database</collection><collection>Engineering Database</collection><collection>Nursing & Allied Health Premium</collection><collection>ProQuest Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Environmental Science Database</collection><collection>Materials Science Collection</collection><collection>Publicly Available Content (ProQuest)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection><collection>Environmental Science Collection</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><collection>Directory of Open Access Journals</collection><jtitle>PloS one</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Orgeira-Crespo, Pedro</au><au>Míguez-Álvarez, Carla</au><au>Cuevas-Alonso, Miguel</au><au>Rivo-López, Elena</au><au>Zhang, Jie</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>An analysis of unconscious gender bias in academic texts by means of a decision algorithm</atitle><jtitle>PloS one</jtitle><date>2021-09-30</date><risdate>2021</risdate><volume>16</volume><issue>9</issue><spage>e0257903</spage><epage>e0257903</epage><pages>e0257903-e0257903</pages><issn>1932-6203</issn><eissn>1932-6203</eissn><abstract>Inclusive language focuses on using the vocabulary to avoid exclusion or discrimination, specially referred to gender. The task of finding gender bias in written documents must be performed manually, and it is a time-consuming process. Consequently, studying the usage of non-inclusive language on a document, and the impact of different document properties (such as author gender, date of presentation, etc.) on how many non-inclusive instances are found, is quite difficult or even impossible for big datasets. This research analyzes the gender bias in academic texts by analyzing a study corpus of more than 12,000 million words obtained from more than one hundred thousand doctoral theses from Spanish universities. For this purpose, an automated algorithm was developed to evaluate the different characteristics of the document and look for interactions between age, year of publication, gender or the field of knowledge in which the doctoral thesis is framed. The algorithm identified information patterns using a CNN (convolutional neural network) by the creation of a vector representation of the sentences. The results showed evidence that there was a greater bias as the age of the authors increased, who were more likely to use non-inclusive terms; it was concluded that there is a greater awareness of inclusiveness in women than in men, and also that this awareness grows as the candidate is younger. The results showed evidence that the age of the authors increased discrimination, with men being more likely to use non-inclusive terms (up to an index of 23.12), showing that there is a greater awareness of inclusiveness in women than in men in all age ranges (with an average of 14.99), and also that this awareness grows as the candidate is younger (falling down to 13.07). In terms of field of knowledge, the humanities are the most biased (20.97), discarding the subgroup of Linguistics, which has the least bias at all levels (9.90), and the field of science and engineering, which also have the least influence (13.46). Those results support the assumption that the bias in academic texts (doctoral theses) is due to unconscious issues: otherwise, it would not depend on the field, age, gender, and would occur in any field in the same proportion. The innovation provided by this research lies mainly in the ability to detect, within a textual document in Spanish, whether the use of language can be considered non-inclusive, based on a CNN that has been trained in the context of the doctoral thesis. A significant number of documents have been used, using all accessible doctoral theses from Spanish universities of the last 40 years; this dataset is only manageable by data mining systems, so that the training allows identifying the terms within the context effectively and compiling them in a novel dictionary of non-inclusive terms.</abstract><cop>San Francisco</cop><pub>Public Library of Science</pub><pmid>34591923</pmid><doi>10.1371/journal.pone.0257903</doi><tpages>e0257903</tpages><orcidid>https://orcid.org/0000-0003-2734-4586</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 1932-6203
ispartof	PloS one, 2021-09, Vol.16 (9), p.e0257903-e0257903
issn	1932-6203 1932-6203
language	eng
recordid	cdi_plos_journals_2578155138
source	Publicly Available Content (ProQuest); PubMed Central
subjects	Age Algorithms Analysis Artificial intelligence Artificial neural networks Bias Biology and Life Sciences Computer and Information Sciences Context Data mining Datasets Demographic aspects Discrimination Dissertations & theses Educational aspects Gender Gender equality Hate speech Human bias Language Linguistic research Linguistics Men Neural networks Physical Sciences Research and Analysis Methods Sentences Sex discrimination Social Sciences Subgroups Texts Theses
title	An analysis of unconscious gender bias in academic texts by means of a decision algorithm
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-07T17%3A57%3A27IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_plos_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=An%20analysis%20of%20unconscious%20gender%20bias%20in%20academic%20texts%20by%20means%20of%20a%20decision%20algorithm&rft.jtitle=PloS%20one&rft.au=Orgeira-Crespo,%20Pedro&rft.date=2021-09-30&rft.volume=16&rft.issue=9&rft.spage=e0257903&rft.epage=e0257903&rft.pages=e0257903-e0257903&rft.issn=1932-6203&rft.eissn=1932-6203&rft_id=info:doi/10.1371/journal.pone.0257903&rft_dat=%3Cgale_plos_%3EA677376485%3C/gale_plos_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c669t-7a5fc6b48ca6540d4ebb4dcd9ff4c9f4a7fd4d20a0c166791b6a6f217fddc8bc3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2578155138&rft_id=info:pmid/34591923&rft_galeid=A677376485&rfr_iscdi=true