Loading…
An analysis of unconscious gender bias in academic texts by means of a decision algorithm
Inclusive language focuses on using the vocabulary to avoid exclusion or discrimination, specially referred to gender. The task of finding gender bias in written documents must be performed manually, and it is a time-consuming process. Consequently, studying the usage of non-inclusive language on a...
Saved in:
Published in: | PloS one 2021-09, Vol.16 (9), p.e0257903-e0257903 |
---|---|
Main Authors: | , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | cdi_FETCH-LOGICAL-c669t-7a5fc6b48ca6540d4ebb4dcd9ff4c9f4a7fd4d20a0c166791b6a6f217fddc8bc3 |
---|---|
cites | cdi_FETCH-LOGICAL-c669t-7a5fc6b48ca6540d4ebb4dcd9ff4c9f4a7fd4d20a0c166791b6a6f217fddc8bc3 |
container_end_page | e0257903 |
container_issue | 9 |
container_start_page | e0257903 |
container_title | PloS one |
container_volume | 16 |
creator | Orgeira-Crespo, Pedro Míguez-Álvarez, Carla Cuevas-Alonso, Miguel Rivo-López, Elena |
description | Inclusive language focuses on using the vocabulary to avoid exclusion or discrimination, specially referred to gender. The task of finding gender bias in written documents must be performed manually, and it is a time-consuming process. Consequently, studying the usage of non-inclusive language on a document, and the impact of different document properties (such as author gender, date of presentation, etc.) on how many non-inclusive instances are found, is quite difficult or even impossible for big datasets. This research analyzes the gender bias in academic texts by analyzing a study corpus of more than 12,000 million words obtained from more than one hundred thousand doctoral theses from Spanish universities. For this purpose, an automated algorithm was developed to evaluate the different characteristics of the document and look for interactions between age, year of publication, gender or the field of knowledge in which the doctoral thesis is framed. The algorithm identified information patterns using a CNN (convolutional neural network) by the creation of a vector representation of the sentences. The results showed evidence that there was a greater bias as the age of the authors increased, who were more likely to use non-inclusive terms; it was concluded that there is a greater awareness of inclusiveness in women than in men, and also that this awareness grows as the candidate is younger. The results showed evidence that the age of the authors increased discrimination, with men being more likely to use non-inclusive terms (up to an index of 23.12), showing that there is a greater awareness of inclusiveness in women than in men in all age ranges (with an average of 14.99), and also that this awareness grows as the candidate is younger (falling down to 13.07). In terms of field of knowledge, the humanities are the most biased (20.97), discarding the subgroup of Linguistics, which has the least bias at all levels (9.90), and the field of science and engineering, which also have the least influence (13.46). Those results support the assumption that the bias in academic texts (doctoral theses) is due to unconscious issues: otherwise, it would not depend on the field, age, gender, and would occur in any field in the same proportion. The innovation provided by this research lies mainly in the ability to detect, within a textual document in Spanish, whether the use of language can be considered non-inclusive, based on a CNN that has been trained in the context o |
doi_str_mv | 10.1371/journal.pone.0257903 |
format | article |
fullrecord | <record><control><sourceid>gale_plos_</sourceid><recordid>TN_cdi_plos_journals_2578155138</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A677376485</galeid><doaj_id>oai_doaj_org_article_ac933f33d8a74ac2bf1204b25abfb9ca</doaj_id><sourcerecordid>A677376485</sourcerecordid><originalsourceid>FETCH-LOGICAL-c669t-7a5fc6b48ca6540d4ebb4dcd9ff4c9f4a7fd4d20a0c166791b6a6f217fddc8bc3</originalsourceid><addsrcrecordid>eNqNk1uL1DAUx4so7kW_gWBBkPVhxtyaNi_CsHgZWFjwBj6Fk1snQ9uMTSvOtzcz05Wt7INPCSe_80_yP-dk2QuMlpiW-O02jH0HzXIXOrtEpCgFoo-ycywoWXCC6ON7-7PsIsYtQgWtOH-anVFWCCwIPc9-rLocksw--pgHl4-dDl3UPowxr21nbJ8rDzH3CdNgbOt1PtjfQ8zVPm8tdMcsyI3VPvqQqKYOvR827bPsiYMm2ufTepl9-_D-6_Wnxc3tx_X16mahORfDooTCaa5YpYEXDBlmlWJGG-Ec08IxKJ1hhiBAGnNeCqw4cEdwChtdKU0vs5cn3V0TopxciTIZUuGiwLRKxPpEmABbuet9C_1eBvDyGAh9LaEfvG6sBC0odZSaCkoGmiiHCWKKFKCcEhqS1rvptlG11mjbDT00M9H5Sec3sg6_ZMUqSoRIAleTQB9-jjYOsvVR26aBzibTj-8uC8FIkdBX_6AP_26iakgf8J0L6V59EJUrXpa05Kw6aC0foO5KmlrI-RSfJbyZJSTmUPcaxhjl-svn_2dvv8_Z1_fYjYVm2MTQjENqnjgH2QnUfYixt-6vyRjJwwTcuSEPEyCnCaB_ABXS-KQ</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2578155138</pqid></control><display><type>article</type><title>An analysis of unconscious gender bias in academic texts by means of a decision algorithm</title><source>Publicly Available Content (ProQuest)</source><source>PubMed Central</source><creator>Orgeira-Crespo, Pedro ; Míguez-Álvarez, Carla ; Cuevas-Alonso, Miguel ; Rivo-López, Elena</creator><contributor>Zhang, Jie</contributor><creatorcontrib>Orgeira-Crespo, Pedro ; Míguez-Álvarez, Carla ; Cuevas-Alonso, Miguel ; Rivo-López, Elena ; Zhang, Jie</creatorcontrib><description>Inclusive language focuses on using the vocabulary to avoid exclusion or discrimination, specially referred to gender. The task of finding gender bias in written documents must be performed manually, and it is a time-consuming process. Consequently, studying the usage of non-inclusive language on a document, and the impact of different document properties (such as author gender, date of presentation, etc.) on how many non-inclusive instances are found, is quite difficult or even impossible for big datasets. This research analyzes the gender bias in academic texts by analyzing a study corpus of more than 12,000 million words obtained from more than one hundred thousand doctoral theses from Spanish universities. For this purpose, an automated algorithm was developed to evaluate the different characteristics of the document and look for interactions between age, year of publication, gender or the field of knowledge in which the doctoral thesis is framed. The algorithm identified information patterns using a CNN (convolutional neural network) by the creation of a vector representation of the sentences. The results showed evidence that there was a greater bias as the age of the authors increased, who were more likely to use non-inclusive terms; it was concluded that there is a greater awareness of inclusiveness in women than in men, and also that this awareness grows as the candidate is younger. The results showed evidence that the age of the authors increased discrimination, with men being more likely to use non-inclusive terms (up to an index of 23.12), showing that there is a greater awareness of inclusiveness in women than in men in all age ranges (with an average of 14.99), and also that this awareness grows as the candidate is younger (falling down to 13.07). In terms of field of knowledge, the humanities are the most biased (20.97), discarding the subgroup of Linguistics, which has the least bias at all levels (9.90), and the field of science and engineering, which also have the least influence (13.46). Those results support the assumption that the bias in academic texts (doctoral theses) is due to unconscious issues: otherwise, it would not depend on the field, age, gender, and would occur in any field in the same proportion. The innovation provided by this research lies mainly in the ability to detect, within a textual document in Spanish, whether the use of language can be considered non-inclusive, based on a CNN that has been trained in the context of the doctoral thesis. A significant number of documents have been used, using all accessible doctoral theses from Spanish universities of the last 40 years; this dataset is only manageable by data mining systems, so that the training allows identifying the terms within the context effectively and compiling them in a novel dictionary of non-inclusive terms.</description><identifier>ISSN: 1932-6203</identifier><identifier>EISSN: 1932-6203</identifier><identifier>DOI: 10.1371/journal.pone.0257903</identifier><identifier>PMID: 34591923</identifier><language>eng</language><publisher>San Francisco: Public Library of Science</publisher><subject>Age ; Algorithms ; Analysis ; Artificial intelligence ; Artificial neural networks ; Bias ; Biology and Life Sciences ; Computer and Information Sciences ; Context ; Data mining ; Datasets ; Demographic aspects ; Discrimination ; Dissertations & theses ; Educational aspects ; Gender ; Gender equality ; Hate speech ; Human bias ; Language ; Linguistic research ; Linguistics ; Men ; Neural networks ; Physical Sciences ; Research and Analysis Methods ; Sentences ; Sex discrimination ; Social Sciences ; Subgroups ; Texts ; Theses</subject><ispartof>PloS one, 2021-09, Vol.16 (9), p.e0257903-e0257903</ispartof><rights>COPYRIGHT 2021 Public Library of Science</rights><rights>2021 Orgeira-Crespo et al. This is an open access article distributed under the terms of the Creative Commons Attribution License: http://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>2021 Orgeira-Crespo et al 2021 Orgeira-Crespo et al</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c669t-7a5fc6b48ca6540d4ebb4dcd9ff4c9f4a7fd4d20a0c166791b6a6f217fddc8bc3</citedby><cites>FETCH-LOGICAL-c669t-7a5fc6b48ca6540d4ebb4dcd9ff4c9f4a7fd4d20a0c166791b6a6f217fddc8bc3</cites><orcidid>0000-0003-2734-4586</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.proquest.com/docview/2578155138/fulltextPDF?pq-origsite=primo$$EPDF$$P50$$Gproquest$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.proquest.com/docview/2578155138?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>230,314,727,780,784,885,25753,27924,27925,37012,37013,44590,53791,53793,74998</link.rule.ids></links><search><contributor>Zhang, Jie</contributor><creatorcontrib>Orgeira-Crespo, Pedro</creatorcontrib><creatorcontrib>Míguez-Álvarez, Carla</creatorcontrib><creatorcontrib>Cuevas-Alonso, Miguel</creatorcontrib><creatorcontrib>Rivo-López, Elena</creatorcontrib><title>An analysis of unconscious gender bias in academic texts by means of a decision algorithm</title><title>PloS one</title><description>Inclusive language focuses on using the vocabulary to avoid exclusion or discrimination, specially referred to gender. The task of finding gender bias in written documents must be performed manually, and it is a time-consuming process. Consequently, studying the usage of non-inclusive language on a document, and the impact of different document properties (such as author gender, date of presentation, etc.) on how many non-inclusive instances are found, is quite difficult or even impossible for big datasets. This research analyzes the gender bias in academic texts by analyzing a study corpus of more than 12,000 million words obtained from more than one hundred thousand doctoral theses from Spanish universities. For this purpose, an automated algorithm was developed to evaluate the different characteristics of the document and look for interactions between age, year of publication, gender or the field of knowledge in which the doctoral thesis is framed. The algorithm identified information patterns using a CNN (convolutional neural network) by the creation of a vector representation of the sentences. The results showed evidence that there was a greater bias as the age of the authors increased, who were more likely to use non-inclusive terms; it was concluded that there is a greater awareness of inclusiveness in women than in men, and also that this awareness grows as the candidate is younger. The results showed evidence that the age of the authors increased discrimination, with men being more likely to use non-inclusive terms (up to an index of 23.12), showing that there is a greater awareness of inclusiveness in women than in men in all age ranges (with an average of 14.99), and also that this awareness grows as the candidate is younger (falling down to 13.07). In terms of field of knowledge, the humanities are the most biased (20.97), discarding the subgroup of Linguistics, which has the least bias at all levels (9.90), and the field of science and engineering, which also have the least influence (13.46). Those results support the assumption that the bias in academic texts (doctoral theses) is due to unconscious issues: otherwise, it would not depend on the field, age, gender, and would occur in any field in the same proportion. The innovation provided by this research lies mainly in the ability to detect, within a textual document in Spanish, whether the use of language can be considered non-inclusive, based on a CNN that has been trained in the context of the doctoral thesis. A significant number of documents have been used, using all accessible doctoral theses from Spanish universities of the last 40 years; this dataset is only manageable by data mining systems, so that the training allows identifying the terms within the context effectively and compiling them in a novel dictionary of non-inclusive terms.</description><subject>Age</subject><subject>Algorithms</subject><subject>Analysis</subject><subject>Artificial intelligence</subject><subject>Artificial neural networks</subject><subject>Bias</subject><subject>Biology and Life Sciences</subject><subject>Computer and Information Sciences</subject><subject>Context</subject><subject>Data mining</subject><subject>Datasets</subject><subject>Demographic aspects</subject><subject>Discrimination</subject><subject>Dissertations & theses</subject><subject>Educational aspects</subject><subject>Gender</subject><subject>Gender equality</subject><subject>Hate speech</subject><subject>Human bias</subject><subject>Language</subject><subject>Linguistic research</subject><subject>Linguistics</subject><subject>Men</subject><subject>Neural networks</subject><subject>Physical Sciences</subject><subject>Research and Analysis Methods</subject><subject>Sentences</subject><subject>Sex discrimination</subject><subject>Social Sciences</subject><subject>Subgroups</subject><subject>Texts</subject><subject>Theses</subject><issn>1932-6203</issn><issn>1932-6203</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><sourceid>DOA</sourceid><recordid>eNqNk1uL1DAUx4so7kW_gWBBkPVhxtyaNi_CsHgZWFjwBj6Fk1snQ9uMTSvOtzcz05Wt7INPCSe_80_yP-dk2QuMlpiW-O02jH0HzXIXOrtEpCgFoo-ycywoWXCC6ON7-7PsIsYtQgWtOH-anVFWCCwIPc9-rLocksw--pgHl4-dDl3UPowxr21nbJ8rDzH3CdNgbOt1PtjfQ8zVPm8tdMcsyI3VPvqQqKYOvR827bPsiYMm2ufTepl9-_D-6_Wnxc3tx_X16mahORfDooTCaa5YpYEXDBlmlWJGG-Ec08IxKJ1hhiBAGnNeCqw4cEdwChtdKU0vs5cn3V0TopxciTIZUuGiwLRKxPpEmABbuet9C_1eBvDyGAh9LaEfvG6sBC0odZSaCkoGmiiHCWKKFKCcEhqS1rvptlG11mjbDT00M9H5Sec3sg6_ZMUqSoRIAleTQB9-jjYOsvVR26aBzibTj-8uC8FIkdBX_6AP_26iakgf8J0L6V59EJUrXpa05Kw6aC0foO5KmlrI-RSfJbyZJSTmUPcaxhjl-svn_2dvv8_Z1_fYjYVm2MTQjENqnjgH2QnUfYixt-6vyRjJwwTcuSEPEyCnCaB_ABXS-KQ</recordid><startdate>20210930</startdate><enddate>20210930</enddate><creator>Orgeira-Crespo, Pedro</creator><creator>Míguez-Álvarez, Carla</creator><creator>Cuevas-Alonso, Miguel</creator><creator>Rivo-López, Elena</creator><general>Public Library of Science</general><general>Public Library of Science (PLoS)</general><scope>AAYXX</scope><scope>CITATION</scope><scope>IOV</scope><scope>ISR</scope><scope>3V.</scope><scope>7QG</scope><scope>7QL</scope><scope>7QO</scope><scope>7RV</scope><scope>7SN</scope><scope>7SS</scope><scope>7T5</scope><scope>7TG</scope><scope>7TM</scope><scope>7U9</scope><scope>7X2</scope><scope>7X7</scope><scope>7XB</scope><scope>88E</scope><scope>8AO</scope><scope>8C1</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FH</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AEUYN</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>ATCPS</scope><scope>AZQEC</scope><scope>BBNVY</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>BHPHI</scope><scope>C1K</scope><scope>CCPQU</scope><scope>D1I</scope><scope>DWQXO</scope><scope>FR3</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>H94</scope><scope>HCIFZ</scope><scope>K9.</scope><scope>KB.</scope><scope>KB0</scope><scope>KL.</scope><scope>L6V</scope><scope>LK8</scope><scope>M0K</scope><scope>M0S</scope><scope>M1P</scope><scope>M7N</scope><scope>M7P</scope><scope>M7S</scope><scope>NAPCQ</scope><scope>P5Z</scope><scope>P62</scope><scope>P64</scope><scope>PATMY</scope><scope>PDBOC</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope><scope>PYCSY</scope><scope>RC3</scope><scope>7X8</scope><scope>5PM</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0003-2734-4586</orcidid></search><sort><creationdate>20210930</creationdate><title>An analysis of unconscious gender bias in academic texts by means of a decision algorithm</title><author>Orgeira-Crespo, Pedro ; Míguez-Álvarez, Carla ; Cuevas-Alonso, Miguel ; Rivo-López, Elena</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c669t-7a5fc6b48ca6540d4ebb4dcd9ff4c9f4a7fd4d20a0c166791b6a6f217fddc8bc3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Age</topic><topic>Algorithms</topic><topic>Analysis</topic><topic>Artificial intelligence</topic><topic>Artificial neural networks</topic><topic>Bias</topic><topic>Biology and Life Sciences</topic><topic>Computer and Information Sciences</topic><topic>Context</topic><topic>Data mining</topic><topic>Datasets</topic><topic>Demographic aspects</topic><topic>Discrimination</topic><topic>Dissertations & theses</topic><topic>Educational aspects</topic><topic>Gender</topic><topic>Gender equality</topic><topic>Hate speech</topic><topic>Human bias</topic><topic>Language</topic><topic>Linguistic research</topic><topic>Linguistics</topic><topic>Men</topic><topic>Neural networks</topic><topic>Physical Sciences</topic><topic>Research and Analysis Methods</topic><topic>Sentences</topic><topic>Sex discrimination</topic><topic>Social Sciences</topic><topic>Subgroups</topic><topic>Texts</topic><topic>Theses</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Orgeira-Crespo, Pedro</creatorcontrib><creatorcontrib>Míguez-Álvarez, Carla</creatorcontrib><creatorcontrib>Cuevas-Alonso, Miguel</creatorcontrib><creatorcontrib>Rivo-López, Elena</creatorcontrib><collection>CrossRef</collection><collection>Opposing Viewpoints In Context</collection><collection>Gale In Context: Science</collection><collection>ProQuest Central (Corporate)</collection><collection>Animal Behavior Abstracts</collection><collection>Bacteriology Abstracts (Microbiology B)</collection><collection>Biotechnology Research Abstracts</collection><collection>Proquest Nursing & Allied Health Source</collection><collection>Ecology Abstracts</collection><collection>Entomology Abstracts (Full archive)</collection><collection>Immunology Abstracts</collection><collection>Meteorological & Geoastrophysical Abstracts</collection><collection>Nucleic Acids Abstracts</collection><collection>Virology and AIDS Abstracts</collection><collection>Agricultural Science Collection</collection><collection>ProQuest Health & Medical Collection</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Medical Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>ProQuest Public Health Database</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest One Sustainability</collection><collection>ProQuest Central</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>Agricultural & Environmental Science Collection</collection><collection>ProQuest Central Essentials</collection><collection>Biological Science Collection</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>Environmental Sciences and Pollution Management</collection><collection>ProQuest One Community College</collection><collection>ProQuest Materials Science Collection</collection><collection>ProQuest Central</collection><collection>Engineering Research Database</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>AIDS and Cancer Research Abstracts</collection><collection>SciTech Premium Collection (Proquest) (PQ_SDU_P3)</collection><collection>ProQuest Health & Medical Complete (Alumni)</collection><collection>https://resources.nclive.org/materials</collection><collection>Nursing & Allied Health Database (Alumni Edition)</collection><collection>Meteorological & Geoastrophysical Abstracts - Academic</collection><collection>ProQuest Engineering Collection</collection><collection>Biological Sciences</collection><collection>Agriculture Science Database</collection><collection>Health & Medical Collection (Alumni Edition)</collection><collection>Medical Database</collection><collection>Algology Mycology and Protozoology Abstracts (Microbiology C)</collection><collection>Biological Science Database</collection><collection>Engineering Database</collection><collection>Nursing & Allied Health Premium</collection><collection>ProQuest Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Environmental Science Database</collection><collection>Materials Science Collection</collection><collection>Publicly Available Content (ProQuest)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection><collection>Environmental Science Collection</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><collection>Directory of Open Access Journals</collection><jtitle>PloS one</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Orgeira-Crespo, Pedro</au><au>Míguez-Álvarez, Carla</au><au>Cuevas-Alonso, Miguel</au><au>Rivo-López, Elena</au><au>Zhang, Jie</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>An analysis of unconscious gender bias in academic texts by means of a decision algorithm</atitle><jtitle>PloS one</jtitle><date>2021-09-30</date><risdate>2021</risdate><volume>16</volume><issue>9</issue><spage>e0257903</spage><epage>e0257903</epage><pages>e0257903-e0257903</pages><issn>1932-6203</issn><eissn>1932-6203</eissn><abstract>Inclusive language focuses on using the vocabulary to avoid exclusion or discrimination, specially referred to gender. The task of finding gender bias in written documents must be performed manually, and it is a time-consuming process. Consequently, studying the usage of non-inclusive language on a document, and the impact of different document properties (such as author gender, date of presentation, etc.) on how many non-inclusive instances are found, is quite difficult or even impossible for big datasets. This research analyzes the gender bias in academic texts by analyzing a study corpus of more than 12,000 million words obtained from more than one hundred thousand doctoral theses from Spanish universities. For this purpose, an automated algorithm was developed to evaluate the different characteristics of the document and look for interactions between age, year of publication, gender or the field of knowledge in which the doctoral thesis is framed. The algorithm identified information patterns using a CNN (convolutional neural network) by the creation of a vector representation of the sentences. The results showed evidence that there was a greater bias as the age of the authors increased, who were more likely to use non-inclusive terms; it was concluded that there is a greater awareness of inclusiveness in women than in men, and also that this awareness grows as the candidate is younger. The results showed evidence that the age of the authors increased discrimination, with men being more likely to use non-inclusive terms (up to an index of 23.12), showing that there is a greater awareness of inclusiveness in women than in men in all age ranges (with an average of 14.99), and also that this awareness grows as the candidate is younger (falling down to 13.07). In terms of field of knowledge, the humanities are the most biased (20.97), discarding the subgroup of Linguistics, which has the least bias at all levels (9.90), and the field of science and engineering, which also have the least influence (13.46). Those results support the assumption that the bias in academic texts (doctoral theses) is due to unconscious issues: otherwise, it would not depend on the field, age, gender, and would occur in any field in the same proportion. The innovation provided by this research lies mainly in the ability to detect, within a textual document in Spanish, whether the use of language can be considered non-inclusive, based on a CNN that has been trained in the context of the doctoral thesis. A significant number of documents have been used, using all accessible doctoral theses from Spanish universities of the last 40 years; this dataset is only manageable by data mining systems, so that the training allows identifying the terms within the context effectively and compiling them in a novel dictionary of non-inclusive terms.</abstract><cop>San Francisco</cop><pub>Public Library of Science</pub><pmid>34591923</pmid><doi>10.1371/journal.pone.0257903</doi><tpages>e0257903</tpages><orcidid>https://orcid.org/0000-0003-2734-4586</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1932-6203 |
ispartof | PloS one, 2021-09, Vol.16 (9), p.e0257903-e0257903 |
issn | 1932-6203 1932-6203 |
language | eng |
recordid | cdi_plos_journals_2578155138 |
source | Publicly Available Content (ProQuest); PubMed Central |
subjects | Age Algorithms Analysis Artificial intelligence Artificial neural networks Bias Biology and Life Sciences Computer and Information Sciences Context Data mining Datasets Demographic aspects Discrimination Dissertations & theses Educational aspects Gender Gender equality Hate speech Human bias Language Linguistic research Linguistics Men Neural networks Physical Sciences Research and Analysis Methods Sentences Sex discrimination Social Sciences Subgroups Texts Theses |
title | An analysis of unconscious gender bias in academic texts by means of a decision algorithm |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-07T17%3A57%3A27IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_plos_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=An%20analysis%20of%20unconscious%20gender%20bias%20in%20academic%20texts%20by%20means%20of%20a%20decision%20algorithm&rft.jtitle=PloS%20one&rft.au=Orgeira-Crespo,%20Pedro&rft.date=2021-09-30&rft.volume=16&rft.issue=9&rft.spage=e0257903&rft.epage=e0257903&rft.pages=e0257903-e0257903&rft.issn=1932-6203&rft.eissn=1932-6203&rft_id=info:doi/10.1371/journal.pone.0257903&rft_dat=%3Cgale_plos_%3EA677376485%3C/gale_plos_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c669t-7a5fc6b48ca6540d4ebb4dcd9ff4c9f4a7fd4d20a0c166791b6a6f217fddc8bc3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2578155138&rft_id=info:pmid/34591923&rft_galeid=A677376485&rfr_iscdi=true |