Loading…

Text-Mining to Identify Gene Sets Involved in Biocorrosion by Sulfate-Reducing Bacteria: A Semi-Automated Workflow

A significant amount of literature is available on biocorrosion, which makes manual extraction of crucial information such as genes and proteins a laborious task. Despite the fast growth of biology related corrosion studies, there is a limited number of gene collections relating to the corrosion pro...

Full description

Saved in:
Bibliographic Details
Published in:Microorganisms (Basel) 2023-01, Vol.11 (1), p.119
Main Authors: Thakur, Payal, Alaba, Mathew O, Rauniyar, Shailabh, Singh, Ram Nageena, Saxena, Priya, Bomgni, Alain, Gnimpieba, Etienne Z, Lushbough, Carol, Goh, Kian Mau, Sani, Rajesh Kumar
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c508t-4740839f27aaf409e4c251ff7577a143fdf23734036a4c8d67786379d79670b83
cites cdi_FETCH-LOGICAL-c508t-4740839f27aaf409e4c251ff7577a143fdf23734036a4c8d67786379d79670b83
container_end_page
container_issue 1
container_start_page 119
container_title Microorganisms (Basel)
container_volume 11
creator Thakur, Payal
Alaba, Mathew O
Rauniyar, Shailabh
Singh, Ram Nageena
Saxena, Priya
Bomgni, Alain
Gnimpieba, Etienne Z
Lushbough, Carol
Goh, Kian Mau
Sani, Rajesh Kumar
description A significant amount of literature is available on biocorrosion, which makes manual extraction of crucial information such as genes and proteins a laborious task. Despite the fast growth of biology related corrosion studies, there is a limited number of gene collections relating to the corrosion process (biocorrosion). Text mining offers a potential solution by automatically extracting the essential information from unstructured text. We present a text mining workflow that extracts biocorrosion associated genes/proteins in sulfate-reducing bacteria (SRB) from literature databases (e.g., PubMed and PMC). This semi-automatic workflow is built with the Named Entity Recognition (NER) method and Convolutional Neural Network (CNN) model. With PubMed and PMCID as inputs, the workflow identified 227 genes belonging to several species. To validate their functions, Gene Ontology (GO) enrichment and biological network analysis was performed using UniprotKB and STRING-DB, respectively. The GO analysis showed that metal ion binding, sulfur binding, and electron transport were among the principal molecular functions. Furthermore, the biological network analysis generated three interlinked clusters containing genes involved in metal ion binding, cellular respiration, and electron transfer, which suggests the involvement of the extracted gene set in biocorrosion. Finally, the dataset was validated through manual curation, yielding a similar set of genes as our workflow; among these, and , and and were identified as the metal ion binding and sulfur metabolism genes, respectively. The identified genes were mapped with the pangenome of 63 SRB genomes that yielded the distribution of these genes across 63 SRB based on the amino acid sequence similarity and were further categorized as core and accessory gene families. SRB's role in biocorrosion involves the transfer of electrons from the metal surface via a hydrogen medium to the sulfate reduction pathway. Therefore, genes encoding hydrogenases and cytochromes might be participating in removing hydrogen from the metals through electron transfer. Moreover, the production of corrosive sulfide from the sulfur metabolism indirectly contributes to the localized pitting of the metals. After the corroboration of text mining results with SRB biocorrosion mechanisms, we suggest that the text mining framework could be utilized for genes/proteins extraction and significantly reduce the manual curation time.
doi_str_mv 10.3390/microorganisms11010119
format article
fullrecord <record><control><sourceid>proquest_doaj_</sourceid><recordid>TN_cdi_doaj_primary_oai_doaj_org_article_7a6b3a624fb6473a98b6884d44be28ac</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><doaj_id>oai_doaj_org_article_7a6b3a624fb6473a98b6884d44be28ac</doaj_id><sourcerecordid>2767272528</sourcerecordid><originalsourceid>FETCH-LOGICAL-c508t-4740839f27aaf409e4c251ff7577a143fdf23734036a4c8d67786379d79670b83</originalsourceid><addsrcrecordid>eNptkl1rFDEUhgdRbKn9CyXgjTdj8zVJxgthW2pdqAi24mXI5GPNOpPUJLO6_75Zt5ZWTC5ySN73IefwNs0Jgm8J6eHp5HWKMa1U8HnKCMG6Uf-sOcSQsxYzyJ8_qg-a45zXsK4eEdGhl80BYYxzitBhk27s79J-8sGHFSgRLI0NxbstuLTBgmtbMliGTRw31gAfwJmPOqYUs48BDFtwPY9OFdt-sWbWO8SZ0sUmr96BRXVPvl3MJU5VYsC3mH64Mf561bxwasz2-P48ar5-uLg5_9hefb5cni-uWt1BUVrKKRSkd5gr5SjsLdW4Q87xjnOFKHHGYcIJhYQpqoWpHQlGeG94zzgcBDlqlnuuiWotb5OfVNrKqLz8c1HnJ1UqXo9WcsUGohimbmCUE9WLgQlBDaWDxULpynq_Z93Ow2SNrkNKanwCffoS_He5ihvZC8Yp7ivgzT0gxZ-zzUVOPms7jirYOGeJORMYI0Zplb7-R7qOcwp1VDsVxxx3eNcd26tqFHJO1j18BkG5S4n8f0qq8eRxKw-2v5kgd5nivHc</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2767272528</pqid></control><display><type>article</type><title>Text-Mining to Identify Gene Sets Involved in Biocorrosion by Sulfate-Reducing Bacteria: A Semi-Automated Workflow</title><source>Open Access: PubMed Central</source><source>ProQuest Publicly Available Content database</source><creator>Thakur, Payal ; Alaba, Mathew O ; Rauniyar, Shailabh ; Singh, Ram Nageena ; Saxena, Priya ; Bomgni, Alain ; Gnimpieba, Etienne Z ; Lushbough, Carol ; Goh, Kian Mau ; Sani, Rajesh Kumar</creator><creatorcontrib>Thakur, Payal ; Alaba, Mathew O ; Rauniyar, Shailabh ; Singh, Ram Nageena ; Saxena, Priya ; Bomgni, Alain ; Gnimpieba, Etienne Z ; Lushbough, Carol ; Goh, Kian Mau ; Sani, Rajesh Kumar</creatorcontrib><description>A significant amount of literature is available on biocorrosion, which makes manual extraction of crucial information such as genes and proteins a laborious task. Despite the fast growth of biology related corrosion studies, there is a limited number of gene collections relating to the corrosion process (biocorrosion). Text mining offers a potential solution by automatically extracting the essential information from unstructured text. We present a text mining workflow that extracts biocorrosion associated genes/proteins in sulfate-reducing bacteria (SRB) from literature databases (e.g., PubMed and PMC). This semi-automatic workflow is built with the Named Entity Recognition (NER) method and Convolutional Neural Network (CNN) model. With PubMed and PMCID as inputs, the workflow identified 227 genes belonging to several species. To validate their functions, Gene Ontology (GO) enrichment and biological network analysis was performed using UniprotKB and STRING-DB, respectively. The GO analysis showed that metal ion binding, sulfur binding, and electron transport were among the principal molecular functions. Furthermore, the biological network analysis generated three interlinked clusters containing genes involved in metal ion binding, cellular respiration, and electron transfer, which suggests the involvement of the extracted gene set in biocorrosion. Finally, the dataset was validated through manual curation, yielding a similar set of genes as our workflow; among these, and , and and were identified as the metal ion binding and sulfur metabolism genes, respectively. The identified genes were mapped with the pangenome of 63 SRB genomes that yielded the distribution of these genes across 63 SRB based on the amino acid sequence similarity and were further categorized as core and accessory gene families. SRB's role in biocorrosion involves the transfer of electrons from the metal surface via a hydrogen medium to the sulfate reduction pathway. Therefore, genes encoding hydrogenases and cytochromes might be participating in removing hydrogen from the metals through electron transfer. Moreover, the production of corrosive sulfide from the sulfur metabolism indirectly contributes to the localized pitting of the metals. After the corroboration of text mining results with SRB biocorrosion mechanisms, we suggest that the text mining framework could be utilized for genes/proteins extraction and significantly reduce the manual curation time.</description><identifier>ISSN: 2076-2607</identifier><identifier>EISSN: 2076-2607</identifier><identifier>DOI: 10.3390/microorganisms11010119</identifier><identifier>PMID: 36677411</identifier><language>eng</language><publisher>Switzerland: MDPI AG</publisher><subject>Amino acid sequence ; Amino acids ; Artificial neural networks ; Automation ; Bacteria ; Bacterial corrosion ; Binding ; biocorrosion ; Biofilms ; Cell cycle ; Corrosion ; Corrosion potential ; Corrosion rate ; Cytochromes ; Data mining ; Electron transfer ; Electron transport ; Gene families ; Genes ; Genomes ; Hydrogen ; Hydrogenase ; Metabolism ; Metabolites ; metal ion ; Metal ions ; Metal surfaces ; Metals ; Microbial corrosion ; Network analysis ; Neural networks ; Proteins ; Sulfate reduction ; Sulfate-reducing bacteria ; Sulfates ; Sulfur ; sulfur metabolism ; text mining ; Unstructured data ; Workflow</subject><ispartof>Microorganisms (Basel), 2023-01, Vol.11 (1), p.119</ispartof><rights>2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>2023 by the authors. 2023</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c508t-4740839f27aaf409e4c251ff7577a143fdf23734036a4c8d67786379d79670b83</citedby><cites>FETCH-LOGICAL-c508t-4740839f27aaf409e4c251ff7577a143fdf23734036a4c8d67786379d79670b83</cites><orcidid>0000-0002-3831-6487 ; 0000-0002-3377-7321 ; 0000-0002-2839-8722 ; 0000-0002-5338-084X ; 0000-0002-5493-252X</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.proquest.com/docview/2767272528/fulltextPDF?pq-origsite=primo$$EPDF$$P50$$Gproquest$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.proquest.com/docview/2767272528?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>230,314,727,780,784,885,25753,27924,27925,37012,37013,44590,53791,53793,75126</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/36677411$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Thakur, Payal</creatorcontrib><creatorcontrib>Alaba, Mathew O</creatorcontrib><creatorcontrib>Rauniyar, Shailabh</creatorcontrib><creatorcontrib>Singh, Ram Nageena</creatorcontrib><creatorcontrib>Saxena, Priya</creatorcontrib><creatorcontrib>Bomgni, Alain</creatorcontrib><creatorcontrib>Gnimpieba, Etienne Z</creatorcontrib><creatorcontrib>Lushbough, Carol</creatorcontrib><creatorcontrib>Goh, Kian Mau</creatorcontrib><creatorcontrib>Sani, Rajesh Kumar</creatorcontrib><title>Text-Mining to Identify Gene Sets Involved in Biocorrosion by Sulfate-Reducing Bacteria: A Semi-Automated Workflow</title><title>Microorganisms (Basel)</title><addtitle>Microorganisms</addtitle><description>A significant amount of literature is available on biocorrosion, which makes manual extraction of crucial information such as genes and proteins a laborious task. Despite the fast growth of biology related corrosion studies, there is a limited number of gene collections relating to the corrosion process (biocorrosion). Text mining offers a potential solution by automatically extracting the essential information from unstructured text. We present a text mining workflow that extracts biocorrosion associated genes/proteins in sulfate-reducing bacteria (SRB) from literature databases (e.g., PubMed and PMC). This semi-automatic workflow is built with the Named Entity Recognition (NER) method and Convolutional Neural Network (CNN) model. With PubMed and PMCID as inputs, the workflow identified 227 genes belonging to several species. To validate their functions, Gene Ontology (GO) enrichment and biological network analysis was performed using UniprotKB and STRING-DB, respectively. The GO analysis showed that metal ion binding, sulfur binding, and electron transport were among the principal molecular functions. Furthermore, the biological network analysis generated three interlinked clusters containing genes involved in metal ion binding, cellular respiration, and electron transfer, which suggests the involvement of the extracted gene set in biocorrosion. Finally, the dataset was validated through manual curation, yielding a similar set of genes as our workflow; among these, and , and and were identified as the metal ion binding and sulfur metabolism genes, respectively. The identified genes were mapped with the pangenome of 63 SRB genomes that yielded the distribution of these genes across 63 SRB based on the amino acid sequence similarity and were further categorized as core and accessory gene families. SRB's role in biocorrosion involves the transfer of electrons from the metal surface via a hydrogen medium to the sulfate reduction pathway. Therefore, genes encoding hydrogenases and cytochromes might be participating in removing hydrogen from the metals through electron transfer. Moreover, the production of corrosive sulfide from the sulfur metabolism indirectly contributes to the localized pitting of the metals. After the corroboration of text mining results with SRB biocorrosion mechanisms, we suggest that the text mining framework could be utilized for genes/proteins extraction and significantly reduce the manual curation time.</description><subject>Amino acid sequence</subject><subject>Amino acids</subject><subject>Artificial neural networks</subject><subject>Automation</subject><subject>Bacteria</subject><subject>Bacterial corrosion</subject><subject>Binding</subject><subject>biocorrosion</subject><subject>Biofilms</subject><subject>Cell cycle</subject><subject>Corrosion</subject><subject>Corrosion potential</subject><subject>Corrosion rate</subject><subject>Cytochromes</subject><subject>Data mining</subject><subject>Electron transfer</subject><subject>Electron transport</subject><subject>Gene families</subject><subject>Genes</subject><subject>Genomes</subject><subject>Hydrogen</subject><subject>Hydrogenase</subject><subject>Metabolism</subject><subject>Metabolites</subject><subject>metal ion</subject><subject>Metal ions</subject><subject>Metal surfaces</subject><subject>Metals</subject><subject>Microbial corrosion</subject><subject>Network analysis</subject><subject>Neural networks</subject><subject>Proteins</subject><subject>Sulfate reduction</subject><subject>Sulfate-reducing bacteria</subject><subject>Sulfates</subject><subject>Sulfur</subject><subject>sulfur metabolism</subject><subject>text mining</subject><subject>Unstructured data</subject><subject>Workflow</subject><issn>2076-2607</issn><issn>2076-2607</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><sourceid>DOA</sourceid><recordid>eNptkl1rFDEUhgdRbKn9CyXgjTdj8zVJxgthW2pdqAi24mXI5GPNOpPUJLO6_75Zt5ZWTC5ySN73IefwNs0Jgm8J6eHp5HWKMa1U8HnKCMG6Uf-sOcSQsxYzyJ8_qg-a45zXsK4eEdGhl80BYYxzitBhk27s79J-8sGHFSgRLI0NxbstuLTBgmtbMliGTRw31gAfwJmPOqYUs48BDFtwPY9OFdt-sWbWO8SZ0sUmr96BRXVPvl3MJU5VYsC3mH64Mf561bxwasz2-P48ar5-uLg5_9hefb5cni-uWt1BUVrKKRSkd5gr5SjsLdW4Q87xjnOFKHHGYcIJhYQpqoWpHQlGeG94zzgcBDlqlnuuiWotb5OfVNrKqLz8c1HnJ1UqXo9WcsUGohimbmCUE9WLgQlBDaWDxULpynq_Z93Ow2SNrkNKanwCffoS_He5ihvZC8Yp7ivgzT0gxZ-zzUVOPms7jirYOGeJORMYI0Zplb7-R7qOcwp1VDsVxxx3eNcd26tqFHJO1j18BkG5S4n8f0qq8eRxKw-2v5kgd5nivHc</recordid><startdate>20230103</startdate><enddate>20230103</enddate><creator>Thakur, Payal</creator><creator>Alaba, Mathew O</creator><creator>Rauniyar, Shailabh</creator><creator>Singh, Ram Nageena</creator><creator>Saxena, Priya</creator><creator>Bomgni, Alain</creator><creator>Gnimpieba, Etienne Z</creator><creator>Lushbough, Carol</creator><creator>Goh, Kian Mau</creator><creator>Sani, Rajesh Kumar</creator><general>MDPI AG</general><general>MDPI</general><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7T7</scope><scope>8FD</scope><scope>8FE</scope><scope>8FH</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ATCPS</scope><scope>AZQEC</scope><scope>BBNVY</scope><scope>BENPR</scope><scope>BHPHI</scope><scope>C1K</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FR3</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>LK8</scope><scope>M7P</scope><scope>P64</scope><scope>PATMY</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PYCSY</scope><scope>7X8</scope><scope>5PM</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0002-3831-6487</orcidid><orcidid>https://orcid.org/0000-0002-3377-7321</orcidid><orcidid>https://orcid.org/0000-0002-2839-8722</orcidid><orcidid>https://orcid.org/0000-0002-5338-084X</orcidid><orcidid>https://orcid.org/0000-0002-5493-252X</orcidid></search><sort><creationdate>20230103</creationdate><title>Text-Mining to Identify Gene Sets Involved in Biocorrosion by Sulfate-Reducing Bacteria: A Semi-Automated Workflow</title><author>Thakur, Payal ; Alaba, Mathew O ; Rauniyar, Shailabh ; Singh, Ram Nageena ; Saxena, Priya ; Bomgni, Alain ; Gnimpieba, Etienne Z ; Lushbough, Carol ; Goh, Kian Mau ; Sani, Rajesh Kumar</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c508t-4740839f27aaf409e4c251ff7577a143fdf23734036a4c8d67786379d79670b83</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Amino acid sequence</topic><topic>Amino acids</topic><topic>Artificial neural networks</topic><topic>Automation</topic><topic>Bacteria</topic><topic>Bacterial corrosion</topic><topic>Binding</topic><topic>biocorrosion</topic><topic>Biofilms</topic><topic>Cell cycle</topic><topic>Corrosion</topic><topic>Corrosion potential</topic><topic>Corrosion rate</topic><topic>Cytochromes</topic><topic>Data mining</topic><topic>Electron transfer</topic><topic>Electron transport</topic><topic>Gene families</topic><topic>Genes</topic><topic>Genomes</topic><topic>Hydrogen</topic><topic>Hydrogenase</topic><topic>Metabolism</topic><topic>Metabolites</topic><topic>metal ion</topic><topic>Metal ions</topic><topic>Metal surfaces</topic><topic>Metals</topic><topic>Microbial corrosion</topic><topic>Network analysis</topic><topic>Neural networks</topic><topic>Proteins</topic><topic>Sulfate reduction</topic><topic>Sulfate-reducing bacteria</topic><topic>Sulfates</topic><topic>Sulfur</topic><topic>sulfur metabolism</topic><topic>text mining</topic><topic>Unstructured data</topic><topic>Workflow</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Thakur, Payal</creatorcontrib><creatorcontrib>Alaba, Mathew O</creatorcontrib><creatorcontrib>Rauniyar, Shailabh</creatorcontrib><creatorcontrib>Singh, Ram Nageena</creatorcontrib><creatorcontrib>Saxena, Priya</creatorcontrib><creatorcontrib>Bomgni, Alain</creatorcontrib><creatorcontrib>Gnimpieba, Etienne Z</creatorcontrib><creatorcontrib>Lushbough, Carol</creatorcontrib><creatorcontrib>Goh, Kian Mau</creatorcontrib><creatorcontrib>Sani, Rajesh Kumar</creatorcontrib><collection>PubMed</collection><collection>CrossRef</collection><collection>Industrial and Applied Microbiology Abstracts (Microbiology A)</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>Agricultural &amp; Environmental Science Collection</collection><collection>ProQuest Central Essentials</collection><collection>Biological Science Collection</collection><collection>AUTh Library subscriptions: ProQuest Central</collection><collection>Natural Science Collection</collection><collection>Environmental Sciences and Pollution Management</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>Engineering Research Database</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Biological Science Collection</collection><collection>Biological Science Database</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Environmental Science Database</collection><collection>ProQuest Publicly Available Content database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>Environmental Science Collection</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><collection>Directory of Open Access Journals</collection><jtitle>Microorganisms (Basel)</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Thakur, Payal</au><au>Alaba, Mathew O</au><au>Rauniyar, Shailabh</au><au>Singh, Ram Nageena</au><au>Saxena, Priya</au><au>Bomgni, Alain</au><au>Gnimpieba, Etienne Z</au><au>Lushbough, Carol</au><au>Goh, Kian Mau</au><au>Sani, Rajesh Kumar</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Text-Mining to Identify Gene Sets Involved in Biocorrosion by Sulfate-Reducing Bacteria: A Semi-Automated Workflow</atitle><jtitle>Microorganisms (Basel)</jtitle><addtitle>Microorganisms</addtitle><date>2023-01-03</date><risdate>2023</risdate><volume>11</volume><issue>1</issue><spage>119</spage><pages>119-</pages><issn>2076-2607</issn><eissn>2076-2607</eissn><abstract>A significant amount of literature is available on biocorrosion, which makes manual extraction of crucial information such as genes and proteins a laborious task. Despite the fast growth of biology related corrosion studies, there is a limited number of gene collections relating to the corrosion process (biocorrosion). Text mining offers a potential solution by automatically extracting the essential information from unstructured text. We present a text mining workflow that extracts biocorrosion associated genes/proteins in sulfate-reducing bacteria (SRB) from literature databases (e.g., PubMed and PMC). This semi-automatic workflow is built with the Named Entity Recognition (NER) method and Convolutional Neural Network (CNN) model. With PubMed and PMCID as inputs, the workflow identified 227 genes belonging to several species. To validate their functions, Gene Ontology (GO) enrichment and biological network analysis was performed using UniprotKB and STRING-DB, respectively. The GO analysis showed that metal ion binding, sulfur binding, and electron transport were among the principal molecular functions. Furthermore, the biological network analysis generated three interlinked clusters containing genes involved in metal ion binding, cellular respiration, and electron transfer, which suggests the involvement of the extracted gene set in biocorrosion. Finally, the dataset was validated through manual curation, yielding a similar set of genes as our workflow; among these, and , and and were identified as the metal ion binding and sulfur metabolism genes, respectively. The identified genes were mapped with the pangenome of 63 SRB genomes that yielded the distribution of these genes across 63 SRB based on the amino acid sequence similarity and were further categorized as core and accessory gene families. SRB's role in biocorrosion involves the transfer of electrons from the metal surface via a hydrogen medium to the sulfate reduction pathway. Therefore, genes encoding hydrogenases and cytochromes might be participating in removing hydrogen from the metals through electron transfer. Moreover, the production of corrosive sulfide from the sulfur metabolism indirectly contributes to the localized pitting of the metals. After the corroboration of text mining results with SRB biocorrosion mechanisms, we suggest that the text mining framework could be utilized for genes/proteins extraction and significantly reduce the manual curation time.</abstract><cop>Switzerland</cop><pub>MDPI AG</pub><pmid>36677411</pmid><doi>10.3390/microorganisms11010119</doi><orcidid>https://orcid.org/0000-0002-3831-6487</orcidid><orcidid>https://orcid.org/0000-0002-3377-7321</orcidid><orcidid>https://orcid.org/0000-0002-2839-8722</orcidid><orcidid>https://orcid.org/0000-0002-5338-084X</orcidid><orcidid>https://orcid.org/0000-0002-5493-252X</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2076-2607
ispartof Microorganisms (Basel), 2023-01, Vol.11 (1), p.119
issn 2076-2607
2076-2607
language eng
recordid cdi_doaj_primary_oai_doaj_org_article_7a6b3a624fb6473a98b6884d44be28ac
source Open Access: PubMed Central; ProQuest Publicly Available Content database
subjects Amino acid sequence
Amino acids
Artificial neural networks
Automation
Bacteria
Bacterial corrosion
Binding
biocorrosion
Biofilms
Cell cycle
Corrosion
Corrosion potential
Corrosion rate
Cytochromes
Data mining
Electron transfer
Electron transport
Gene families
Genes
Genomes
Hydrogen
Hydrogenase
Metabolism
Metabolites
metal ion
Metal ions
Metal surfaces
Metals
Microbial corrosion
Network analysis
Neural networks
Proteins
Sulfate reduction
Sulfate-reducing bacteria
Sulfates
Sulfur
sulfur metabolism
text mining
Unstructured data
Workflow
title Text-Mining to Identify Gene Sets Involved in Biocorrosion by Sulfate-Reducing Bacteria: A Semi-Automated Workflow
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-24T16%3A07%3A40IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_doaj_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Text-Mining%20to%20Identify%20Gene%20Sets%20Involved%20in%20Biocorrosion%20by%20Sulfate-Reducing%20Bacteria:%20A%20Semi-Automated%20Workflow&rft.jtitle=Microorganisms%20(Basel)&rft.au=Thakur,%20Payal&rft.date=2023-01-03&rft.volume=11&rft.issue=1&rft.spage=119&rft.pages=119-&rft.issn=2076-2607&rft.eissn=2076-2607&rft_id=info:doi/10.3390/microorganisms11010119&rft_dat=%3Cproquest_doaj_%3E2767272528%3C/proquest_doaj_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c508t-4740839f27aaf409e4c251ff7577a143fdf23734036a4c8d67786379d79670b83%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2767272528&rft_id=info:pmid/36677411&rfr_iscdi=true