Loading…

Identification of Species by Combining Molecular and Morphological Data Using Convolutional Neural Networks

Abstract Integrative taxonomy is central to modern taxonomy and systematic biology, including behavior, niche preference, distribution, morphological analysis, and DNA barcoding. However, decades of use demonstrate that these methods can face challenges when used in isolation, for instance, potentia...

Full description

Saved in:
Bibliographic Details
Published in:Systematic biology 2022-04, Vol.71 (3), p.690-705
Main Authors: Yang, Bing, Zhang, Zhenxin, Yang, Cai-Qing, Wang, Ying, Orr, Michael C, Wang, Hongbin, Zhang, Ai-Bing
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c365t-b1182b15e00da528ce46c6272047933a9adb78d3ebe747a6916d9b5421e31bf23
cites cdi_FETCH-LOGICAL-c365t-b1182b15e00da528ce46c6272047933a9adb78d3ebe747a6916d9b5421e31bf23
container_end_page 705
container_issue 3
container_start_page 690
container_title Systematic biology
container_volume 71
creator Yang, Bing
Zhang, Zhenxin
Yang, Cai-Qing
Wang, Ying
Orr, Michael C
Wang, Hongbin
Zhang, Ai-Bing
description Abstract Integrative taxonomy is central to modern taxonomy and systematic biology, including behavior, niche preference, distribution, morphological analysis, and DNA barcoding. However, decades of use demonstrate that these methods can face challenges when used in isolation, for instance, potential misidentifications due to phenotypic plasticity for morphological methods, and incorrect identifications because of introgression, incomplete lineage sorting, and horizontal gene transfer for DNA barcoding. Although researchers have advocated the use of integrative taxonomy, few detailed algorithms have been proposed. Here, we develop a convolutional neural network method (morphology-molecule network [MMNet]) that integrates morphological and molecular data for species identification. The newly proposed method (MMNet) worked better than four currently available alternative methods when tested with 10 independent data sets representing varying genetic diversity from different taxa. High accuracies were achieved for all groups, including beetles (98.1% of 123 species), butterflies (98.8% of 24 species), fishes (96.3% of 214 species), and moths (96.4% of 150 total species). Further, MMNet demonstrated a high degree of accuracy ($>$98%) in four data sets including closely related species from the same genus. The average accuracy of two modest subgenomic (single nucleotide polymorphism) data sets, comprising eight putative subspecies respectively, is 90%. Additional tests show that the success rate of species identification under this method most strongly depends on the amount of training data, and is robust to sequence length and image size. Analyses on the contribution of different data types (image vs. gene) indicate that both morphological and genetic data are important to the model, and that genetic data contribute slightly more. The approaches developed here serve as a foundation for the future integration of multimodal information for integrative taxonomy, such as image, audio, video, 3D scanning, and biosensor data, to characterize organisms more comprehensively as a basis for improved investigation, monitoring, and conservation of biodiversity. [Convolutional neural network; deep learning; integrative taxonomy; single nucleotide polymorphism; species identification.]
doi_str_mv 10.1093/sysbio/syab076
format article
fullrecord <record><control><sourceid>oup_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1093_sysbio_syab076</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><oup_id>10.1093/sysbio/syab076</oup_id><sourcerecordid>10.1093/sysbio/syab076</sourcerecordid><originalsourceid>FETCH-LOGICAL-c365t-b1182b15e00da528ce46c6272047933a9adb78d3ebe747a6916d9b5421e31bf23</originalsourceid><addsrcrecordid>eNqFkL1PwzAQxS0EoqWwMiKvDCl2HNvJiMJXpQIDVGKLbMcppmkc2Qmo_z0OKawMp3ene-8NPwDOMZpjlJErv_PS2CBCIs4OwBQHiVLC3g6HnZGIYson4MT7D4QwZhQfgwlJaJyEmYLNotRNZyqjRGdsA20FX1qtjPZQ7mBut9I0plnDR1tr1dfCQdGU4XLtu63tOsRqeCM6AVd-sOW2-bR1P1SFx5Pu3Y90X9Zt_Ck4qkTt9dleZ2B1d_uaP0TL5_tFfr2MFGG0iyTGaSwx1QiVgsap0glTLOYxSnhGiMhEKXlaEi01T7hgGWZlJmkSY02wrGIyA_OxVznrvdNV0TqzFW5XYFQM1IqRWrGnFgIXY6Dt5VaXf_ZfTMFwORps3_5X9g0Y_Xs0</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Identification of Species by Combining Molecular and Morphological Data Using Convolutional Neural Networks</title><source>Oxford University Press:Jisc Collections:OUP Read and Publish 2024-2025 (2024 collection) (Reading list)</source><creator>Yang, Bing ; Zhang, Zhenxin ; Yang, Cai-Qing ; Wang, Ying ; Orr, Michael C ; Wang, Hongbin ; Zhang, Ai-Bing</creator><contributor>Burbrink, Frank</contributor><creatorcontrib>Yang, Bing ; Zhang, Zhenxin ; Yang, Cai-Qing ; Wang, Ying ; Orr, Michael C ; Wang, Hongbin ; Zhang, Ai-Bing ; Burbrink, Frank</creatorcontrib><description>Abstract Integrative taxonomy is central to modern taxonomy and systematic biology, including behavior, niche preference, distribution, morphological analysis, and DNA barcoding. However, decades of use demonstrate that these methods can face challenges when used in isolation, for instance, potential misidentifications due to phenotypic plasticity for morphological methods, and incorrect identifications because of introgression, incomplete lineage sorting, and horizontal gene transfer for DNA barcoding. Although researchers have advocated the use of integrative taxonomy, few detailed algorithms have been proposed. Here, we develop a convolutional neural network method (morphology-molecule network [MMNet]) that integrates morphological and molecular data for species identification. The newly proposed method (MMNet) worked better than four currently available alternative methods when tested with 10 independent data sets representing varying genetic diversity from different taxa. High accuracies were achieved for all groups, including beetles (98.1% of 123 species), butterflies (98.8% of 24 species), fishes (96.3% of 214 species), and moths (96.4% of 150 total species). Further, MMNet demonstrated a high degree of accuracy ($&gt;$98%) in four data sets including closely related species from the same genus. The average accuracy of two modest subgenomic (single nucleotide polymorphism) data sets, comprising eight putative subspecies respectively, is 90%. Additional tests show that the success rate of species identification under this method most strongly depends on the amount of training data, and is robust to sequence length and image size. Analyses on the contribution of different data types (image vs. gene) indicate that both morphological and genetic data are important to the model, and that genetic data contribute slightly more. The approaches developed here serve as a foundation for the future integration of multimodal information for integrative taxonomy, such as image, audio, video, 3D scanning, and biosensor data, to characterize organisms more comprehensively as a basis for improved investigation, monitoring, and conservation of biodiversity. [Convolutional neural network; deep learning; integrative taxonomy; single nucleotide polymorphism; species identification.]</description><identifier>ISSN: 1063-5157</identifier><identifier>EISSN: 1076-836X</identifier><identifier>DOI: 10.1093/sysbio/syab076</identifier><identifier>PMID: 34524452</identifier><language>eng</language><publisher>England: Oxford University Press</publisher><subject>Animals ; Biodiversity ; Butterflies - genetics ; DNA - genetics ; DNA Barcoding, Taxonomic - methods ; Neural Networks, Computer ; Phylogeny</subject><ispartof>Systematic biology, 2022-04, Vol.71 (3), p.690-705</ispartof><rights>The Author(s) 2021. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For permissions, please email: journals.permissions@oup.com 2021</rights><rights>The Author(s) 2021. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For permissions, please email: journals.permissions@oup.com.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c365t-b1182b15e00da528ce46c6272047933a9adb78d3ebe747a6916d9b5421e31bf23</citedby><cites>FETCH-LOGICAL-c365t-b1182b15e00da528ce46c6272047933a9adb78d3ebe747a6916d9b5421e31bf23</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/34524452$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><contributor>Burbrink, Frank</contributor><creatorcontrib>Yang, Bing</creatorcontrib><creatorcontrib>Zhang, Zhenxin</creatorcontrib><creatorcontrib>Yang, Cai-Qing</creatorcontrib><creatorcontrib>Wang, Ying</creatorcontrib><creatorcontrib>Orr, Michael C</creatorcontrib><creatorcontrib>Wang, Hongbin</creatorcontrib><creatorcontrib>Zhang, Ai-Bing</creatorcontrib><title>Identification of Species by Combining Molecular and Morphological Data Using Convolutional Neural Networks</title><title>Systematic biology</title><addtitle>Syst Biol</addtitle><description>Abstract Integrative taxonomy is central to modern taxonomy and systematic biology, including behavior, niche preference, distribution, morphological analysis, and DNA barcoding. However, decades of use demonstrate that these methods can face challenges when used in isolation, for instance, potential misidentifications due to phenotypic plasticity for morphological methods, and incorrect identifications because of introgression, incomplete lineage sorting, and horizontal gene transfer for DNA barcoding. Although researchers have advocated the use of integrative taxonomy, few detailed algorithms have been proposed. Here, we develop a convolutional neural network method (morphology-molecule network [MMNet]) that integrates morphological and molecular data for species identification. The newly proposed method (MMNet) worked better than four currently available alternative methods when tested with 10 independent data sets representing varying genetic diversity from different taxa. High accuracies were achieved for all groups, including beetles (98.1% of 123 species), butterflies (98.8% of 24 species), fishes (96.3% of 214 species), and moths (96.4% of 150 total species). Further, MMNet demonstrated a high degree of accuracy ($&gt;$98%) in four data sets including closely related species from the same genus. The average accuracy of two modest subgenomic (single nucleotide polymorphism) data sets, comprising eight putative subspecies respectively, is 90%. Additional tests show that the success rate of species identification under this method most strongly depends on the amount of training data, and is robust to sequence length and image size. Analyses on the contribution of different data types (image vs. gene) indicate that both morphological and genetic data are important to the model, and that genetic data contribute slightly more. The approaches developed here serve as a foundation for the future integration of multimodal information for integrative taxonomy, such as image, audio, video, 3D scanning, and biosensor data, to characterize organisms more comprehensively as a basis for improved investigation, monitoring, and conservation of biodiversity. [Convolutional neural network; deep learning; integrative taxonomy; single nucleotide polymorphism; species identification.]</description><subject>Animals</subject><subject>Biodiversity</subject><subject>Butterflies - genetics</subject><subject>DNA - genetics</subject><subject>DNA Barcoding, Taxonomic - methods</subject><subject>Neural Networks, Computer</subject><subject>Phylogeny</subject><issn>1063-5157</issn><issn>1076-836X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><recordid>eNqFkL1PwzAQxS0EoqWwMiKvDCl2HNvJiMJXpQIDVGKLbMcppmkc2Qmo_z0OKawMp3ene-8NPwDOMZpjlJErv_PS2CBCIs4OwBQHiVLC3g6HnZGIYson4MT7D4QwZhQfgwlJaJyEmYLNotRNZyqjRGdsA20FX1qtjPZQ7mBut9I0plnDR1tr1dfCQdGU4XLtu63tOsRqeCM6AVd-sOW2-bR1P1SFx5Pu3Y90X9Zt_Ck4qkTt9dleZ2B1d_uaP0TL5_tFfr2MFGG0iyTGaSwx1QiVgsap0glTLOYxSnhGiMhEKXlaEi01T7hgGWZlJmkSY02wrGIyA_OxVznrvdNV0TqzFW5XYFQM1IqRWrGnFgIXY6Dt5VaXf_ZfTMFwORps3_5X9g0Y_Xs0</recordid><startdate>20220419</startdate><enddate>20220419</enddate><creator>Yang, Bing</creator><creator>Zhang, Zhenxin</creator><creator>Yang, Cai-Qing</creator><creator>Wang, Ying</creator><creator>Orr, Michael C</creator><creator>Wang, Hongbin</creator><creator>Zhang, Ai-Bing</creator><general>Oxford University Press</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>20220419</creationdate><title>Identification of Species by Combining Molecular and Morphological Data Using Convolutional Neural Networks</title><author>Yang, Bing ; Zhang, Zhenxin ; Yang, Cai-Qing ; Wang, Ying ; Orr, Michael C ; Wang, Hongbin ; Zhang, Ai-Bing</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c365t-b1182b15e00da528ce46c6272047933a9adb78d3ebe747a6916d9b5421e31bf23</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Animals</topic><topic>Biodiversity</topic><topic>Butterflies - genetics</topic><topic>DNA - genetics</topic><topic>DNA Barcoding, Taxonomic - methods</topic><topic>Neural Networks, Computer</topic><topic>Phylogeny</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Yang, Bing</creatorcontrib><creatorcontrib>Zhang, Zhenxin</creatorcontrib><creatorcontrib>Yang, Cai-Qing</creatorcontrib><creatorcontrib>Wang, Ying</creatorcontrib><creatorcontrib>Orr, Michael C</creatorcontrib><creatorcontrib>Wang, Hongbin</creatorcontrib><creatorcontrib>Zhang, Ai-Bing</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><jtitle>Systematic biology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Yang, Bing</au><au>Zhang, Zhenxin</au><au>Yang, Cai-Qing</au><au>Wang, Ying</au><au>Orr, Michael C</au><au>Wang, Hongbin</au><au>Zhang, Ai-Bing</au><au>Burbrink, Frank</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Identification of Species by Combining Molecular and Morphological Data Using Convolutional Neural Networks</atitle><jtitle>Systematic biology</jtitle><addtitle>Syst Biol</addtitle><date>2022-04-19</date><risdate>2022</risdate><volume>71</volume><issue>3</issue><spage>690</spage><epage>705</epage><pages>690-705</pages><issn>1063-5157</issn><eissn>1076-836X</eissn><abstract>Abstract Integrative taxonomy is central to modern taxonomy and systematic biology, including behavior, niche preference, distribution, morphological analysis, and DNA barcoding. However, decades of use demonstrate that these methods can face challenges when used in isolation, for instance, potential misidentifications due to phenotypic plasticity for morphological methods, and incorrect identifications because of introgression, incomplete lineage sorting, and horizontal gene transfer for DNA barcoding. Although researchers have advocated the use of integrative taxonomy, few detailed algorithms have been proposed. Here, we develop a convolutional neural network method (morphology-molecule network [MMNet]) that integrates morphological and molecular data for species identification. The newly proposed method (MMNet) worked better than four currently available alternative methods when tested with 10 independent data sets representing varying genetic diversity from different taxa. High accuracies were achieved for all groups, including beetles (98.1% of 123 species), butterflies (98.8% of 24 species), fishes (96.3% of 214 species), and moths (96.4% of 150 total species). Further, MMNet demonstrated a high degree of accuracy ($&gt;$98%) in four data sets including closely related species from the same genus. The average accuracy of two modest subgenomic (single nucleotide polymorphism) data sets, comprising eight putative subspecies respectively, is 90%. Additional tests show that the success rate of species identification under this method most strongly depends on the amount of training data, and is robust to sequence length and image size. Analyses on the contribution of different data types (image vs. gene) indicate that both morphological and genetic data are important to the model, and that genetic data contribute slightly more. The approaches developed here serve as a foundation for the future integration of multimodal information for integrative taxonomy, such as image, audio, video, 3D scanning, and biosensor data, to characterize organisms more comprehensively as a basis for improved investigation, monitoring, and conservation of biodiversity. [Convolutional neural network; deep learning; integrative taxonomy; single nucleotide polymorphism; species identification.]</abstract><cop>England</cop><pub>Oxford University Press</pub><pmid>34524452</pmid><doi>10.1093/sysbio/syab076</doi><tpages>16</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1063-5157
ispartof Systematic biology, 2022-04, Vol.71 (3), p.690-705
issn 1063-5157
1076-836X
language eng
recordid cdi_crossref_primary_10_1093_sysbio_syab076
source Oxford University Press:Jisc Collections:OUP Read and Publish 2024-2025 (2024 collection) (Reading list)
subjects Animals
Biodiversity
Butterflies - genetics
DNA - genetics
DNA Barcoding, Taxonomic - methods
Neural Networks, Computer
Phylogeny
title Identification of Species by Combining Molecular and Morphological Data Using Convolutional Neural Networks
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-02T21%3A23%3A13IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-oup_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Identification%20of%20Species%20by%20Combining%20Molecular%20and%20Morphological%20Data%20Using%20Convolutional%20Neural%20Networks&rft.jtitle=Systematic%20biology&rft.au=Yang,%20Bing&rft.date=2022-04-19&rft.volume=71&rft.issue=3&rft.spage=690&rft.epage=705&rft.pages=690-705&rft.issn=1063-5157&rft.eissn=1076-836X&rft_id=info:doi/10.1093/sysbio/syab076&rft_dat=%3Coup_cross%3E10.1093/sysbio/syab076%3C/oup_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c365t-b1182b15e00da528ce46c6272047933a9adb78d3ebe747a6916d9b5421e31bf23%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/34524452&rft_oup_id=10.1093/sysbio/syab076&rfr_iscdi=true