Loading…

Combining phenotypic and genomic data to improve prediction of binary traits

Plant breeders want to develop cultivars that outperform existing genotypes. Some characteristics (here 'main traits') of these cultivars are categorical and difficult to measure directly. It is important to predict the main trait of newly developed genotypes accurately. In addition to mar...

Full description

Saved in:
Bibliographic Details
Published in:Journal of applied statistics 2024-06, Vol.51 (8), p.1497-1523
Main Authors: Jarquin, D., Roy, A., Clarke, B., Ghosal, S.
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c497t-5c4e9d7f222699f64c7a972fea5b4eb5c831f729d241bf6b7b96f0b5065c8c3
cites cdi_FETCH-LOGICAL-c497t-5c4e9d7f222699f64c7a972fea5b4eb5c831f729d241bf6b7b96f0b5065c8c3
container_end_page 1523
container_issue 8
container_start_page 1497
container_title Journal of applied statistics
container_volume 51
creator Jarquin, D.
Roy, A.
Clarke, B.
Ghosal, S.
description Plant breeders want to develop cultivars that outperform existing genotypes. Some characteristics (here 'main traits') of these cultivars are categorical and difficult to measure directly. It is important to predict the main trait of newly developed genotypes accurately. In addition to marker data, breeding programs often have information on secondary traits (or 'phenotypes') that are easy to measure. Our goal is to improve prediction of main traits with interpretable relations by combining the two data types using variable selection techniques. However, the genomic characteristics can overwhelm the set of secondary traits, so a standard technique may fail to select any phenotypic variables. We develop a new statistical technique that ensures appropriate representation from both the secondary traits and the genotypic variables for optimal prediction. When two data types (markers and secondary traits) are available, we achieve improved prediction of a binary trait by two steps that are designed to ensure that a significant intrinsic effect of a phenotype is incorporated in the relation before accounting for extra effects of genotypes. First, we sparsely regress the secondary traits on the markers and replace the secondary traits by their residuals to obtain the effects of phenotypic variables as adjusted by the genotypic variables. Then, we develop a sparse logistic classifier using the markers and residuals so that the adjusted phenotypes may be selected first to avoid being overwhelmed by the genotypic variables due to their numerical advantage. This classifier uses forward selection aided by a penalty term and can be computed effectively by a technique called the one-pass method. It compares favorably with other classifiers on simulated and real data.
doi_str_mv 10.1080/02664763.2023.2208773
format article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_3067911792</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3065770611</sourcerecordid><originalsourceid>FETCH-LOGICAL-c497t-5c4e9d7f222699f64c7a972fea5b4eb5c831f729d241bf6b7b96f0b5065c8c3</originalsourceid><addsrcrecordid>eNp9kU-PFCEQxYnRuOPqR9CQePHSawHd0JzUTHbVZBIPeic0DbNsuqEFZs18e-nM7EY9eOFP6levXuUh9JrAFYEe3gPlvBWcXVGg9aDQC8GeoA1hHBroGH2KNivTrNAFepHzHQD0pGPP0QXre856oBu028Z58MGHPV5ubYjluHiDdRjxvv7m-h510bhE7OclxXuLl2RHb4qPAUeHa69OR1yS9iW_RM-cnrJ9db4v0feb6x_bL83u2-ev20-7xrRSlKYzrZWjcJRSLqXjrRFaCuqs7obWDp3pGXGCypG2ZHB8EIPkDoYOeC0Zdok-nFSXwzDb0dhQp09qSX6uVlTUXv1dCf5W7eO9IoTwFpisCu_OCin-PNhc1OyzsdOkg42HrBhwIQkRklb07T_oXTykULdbqU4I4IRUqjtRJsWck3WPbgioNS71EJda41LnuGrfmz9Xeex6yKcCH0-ADy6mWf-KaRpV0ccpJpd0ML76-P-M34L-pQg</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3065770611</pqid></control><display><type>article</type><title>Combining phenotypic and genomic data to improve prediction of binary traits</title><source>Taylor and Francis Science and Technology Collection</source><creator>Jarquin, D. ; Roy, A. ; Clarke, B. ; Ghosal, S.</creator><creatorcontrib>Jarquin, D. ; Roy, A. ; Clarke, B. ; Ghosal, S.</creatorcontrib><description>Plant breeders want to develop cultivars that outperform existing genotypes. Some characteristics (here 'main traits') of these cultivars are categorical and difficult to measure directly. It is important to predict the main trait of newly developed genotypes accurately. In addition to marker data, breeding programs often have information on secondary traits (or 'phenotypes') that are easy to measure. Our goal is to improve prediction of main traits with interpretable relations by combining the two data types using variable selection techniques. However, the genomic characteristics can overwhelm the set of secondary traits, so a standard technique may fail to select any phenotypic variables. We develop a new statistical technique that ensures appropriate representation from both the secondary traits and the genotypic variables for optimal prediction. When two data types (markers and secondary traits) are available, we achieve improved prediction of a binary trait by two steps that are designed to ensure that a significant intrinsic effect of a phenotype is incorporated in the relation before accounting for extra effects of genotypes. First, we sparsely regress the secondary traits on the markers and replace the secondary traits by their residuals to obtain the effects of phenotypic variables as adjusted by the genotypic variables. Then, we develop a sparse logistic classifier using the markers and residuals so that the adjusted phenotypes may be selected first to avoid being overwhelmed by the genotypic variables due to their numerical advantage. This classifier uses forward selection aided by a penalty term and can be computed effectively by a technique called the one-pass method. It compares favorably with other classifiers on simulated and real data.</description><identifier>ISSN: 0266-4763</identifier><identifier>EISSN: 1360-0532</identifier><identifier>DOI: 10.1080/02664763.2023.2208773</identifier><identifier>PMID: 38863802</identifier><language>eng</language><publisher>England: Taylor &amp; Francis</publisher><subject>Classification ; Classifiers ; genotype ; multitype data ; phenotype ; sparsity ; Statistical methods ; Variables</subject><ispartof>Journal of applied statistics, 2024-06, Vol.51 (8), p.1497-1523</ispartof><rights>2023 Informa UK Limited, trading as Taylor &amp; Francis Group 2023</rights><rights>2023 Informa UK Limited, trading as Taylor &amp; Francis Group.</rights><rights>2023 Informa UK Limited, trading as Taylor &amp; Francis Group</rights><rights>2023 Informa UK Limited, trading as Taylor &amp; Francis Group 2023 Taylor &amp; Francis</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c497t-5c4e9d7f222699f64c7a972fea5b4eb5c831f729d241bf6b7b96f0b5065c8c3</citedby><cites>FETCH-LOGICAL-c497t-5c4e9d7f222699f64c7a972fea5b4eb5c831f729d241bf6b7b96f0b5065c8c3</cites><orcidid>0000-0002-1710-9761 ; 0000-0002-5098-2060</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>230,314,780,784,885,27924,27925</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/38863802$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Jarquin, D.</creatorcontrib><creatorcontrib>Roy, A.</creatorcontrib><creatorcontrib>Clarke, B.</creatorcontrib><creatorcontrib>Ghosal, S.</creatorcontrib><title>Combining phenotypic and genomic data to improve prediction of binary traits</title><title>Journal of applied statistics</title><addtitle>J Appl Stat</addtitle><description>Plant breeders want to develop cultivars that outperform existing genotypes. Some characteristics (here 'main traits') of these cultivars are categorical and difficult to measure directly. It is important to predict the main trait of newly developed genotypes accurately. In addition to marker data, breeding programs often have information on secondary traits (or 'phenotypes') that are easy to measure. Our goal is to improve prediction of main traits with interpretable relations by combining the two data types using variable selection techniques. However, the genomic characteristics can overwhelm the set of secondary traits, so a standard technique may fail to select any phenotypic variables. We develop a new statistical technique that ensures appropriate representation from both the secondary traits and the genotypic variables for optimal prediction. When two data types (markers and secondary traits) are available, we achieve improved prediction of a binary trait by two steps that are designed to ensure that a significant intrinsic effect of a phenotype is incorporated in the relation before accounting for extra effects of genotypes. First, we sparsely regress the secondary traits on the markers and replace the secondary traits by their residuals to obtain the effects of phenotypic variables as adjusted by the genotypic variables. Then, we develop a sparse logistic classifier using the markers and residuals so that the adjusted phenotypes may be selected first to avoid being overwhelmed by the genotypic variables due to their numerical advantage. This classifier uses forward selection aided by a penalty term and can be computed effectively by a technique called the one-pass method. It compares favorably with other classifiers on simulated and real data.</description><subject>Classification</subject><subject>Classifiers</subject><subject>genotype</subject><subject>multitype data</subject><subject>phenotype</subject><subject>sparsity</subject><subject>Statistical methods</subject><subject>Variables</subject><issn>0266-4763</issn><issn>1360-0532</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNp9kU-PFCEQxYnRuOPqR9CQePHSawHd0JzUTHbVZBIPeic0DbNsuqEFZs18e-nM7EY9eOFP6levXuUh9JrAFYEe3gPlvBWcXVGg9aDQC8GeoA1hHBroGH2KNivTrNAFepHzHQD0pGPP0QXre856oBu028Z58MGHPV5ubYjluHiDdRjxvv7m-h510bhE7OclxXuLl2RHb4qPAUeHa69OR1yS9iW_RM-cnrJ9db4v0feb6x_bL83u2-ev20-7xrRSlKYzrZWjcJRSLqXjrRFaCuqs7obWDp3pGXGCypG2ZHB8EIPkDoYOeC0Zdok-nFSXwzDb0dhQp09qSX6uVlTUXv1dCf5W7eO9IoTwFpisCu_OCin-PNhc1OyzsdOkg42HrBhwIQkRklb07T_oXTykULdbqU4I4IRUqjtRJsWck3WPbgioNS71EJda41LnuGrfmz9Xeex6yKcCH0-ADy6mWf-KaRpV0ccpJpd0ML76-P-M34L-pQg</recordid><startdate>20240610</startdate><enddate>20240610</enddate><creator>Jarquin, D.</creator><creator>Roy, A.</creator><creator>Clarke, B.</creator><creator>Ghosal, S.</creator><general>Taylor &amp; Francis</general><general>Taylor &amp; Francis Ltd</general><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>H8D</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>7X8</scope><scope>5PM</scope><orcidid>https://orcid.org/0000-0002-1710-9761</orcidid><orcidid>https://orcid.org/0000-0002-5098-2060</orcidid></search><sort><creationdate>20240610</creationdate><title>Combining phenotypic and genomic data to improve prediction of binary traits</title><author>Jarquin, D. ; Roy, A. ; Clarke, B. ; Ghosal, S.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c497t-5c4e9d7f222699f64c7a972fea5b4eb5c831f729d241bf6b7b96f0b5065c8c3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Classification</topic><topic>Classifiers</topic><topic>genotype</topic><topic>multitype data</topic><topic>phenotype</topic><topic>sparsity</topic><topic>Statistical methods</topic><topic>Variables</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Jarquin, D.</creatorcontrib><creatorcontrib>Roy, A.</creatorcontrib><creatorcontrib>Clarke, B.</creatorcontrib><creatorcontrib>Ghosal, S.</creatorcontrib><collection>PubMed</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>Aerospace Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Journal of applied statistics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Jarquin, D.</au><au>Roy, A.</au><au>Clarke, B.</au><au>Ghosal, S.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Combining phenotypic and genomic data to improve prediction of binary traits</atitle><jtitle>Journal of applied statistics</jtitle><addtitle>J Appl Stat</addtitle><date>2024-06-10</date><risdate>2024</risdate><volume>51</volume><issue>8</issue><spage>1497</spage><epage>1523</epage><pages>1497-1523</pages><issn>0266-4763</issn><eissn>1360-0532</eissn><abstract>Plant breeders want to develop cultivars that outperform existing genotypes. Some characteristics (here 'main traits') of these cultivars are categorical and difficult to measure directly. It is important to predict the main trait of newly developed genotypes accurately. In addition to marker data, breeding programs often have information on secondary traits (or 'phenotypes') that are easy to measure. Our goal is to improve prediction of main traits with interpretable relations by combining the two data types using variable selection techniques. However, the genomic characteristics can overwhelm the set of secondary traits, so a standard technique may fail to select any phenotypic variables. We develop a new statistical technique that ensures appropriate representation from both the secondary traits and the genotypic variables for optimal prediction. When two data types (markers and secondary traits) are available, we achieve improved prediction of a binary trait by two steps that are designed to ensure that a significant intrinsic effect of a phenotype is incorporated in the relation before accounting for extra effects of genotypes. First, we sparsely regress the secondary traits on the markers and replace the secondary traits by their residuals to obtain the effects of phenotypic variables as adjusted by the genotypic variables. Then, we develop a sparse logistic classifier using the markers and residuals so that the adjusted phenotypes may be selected first to avoid being overwhelmed by the genotypic variables due to their numerical advantage. This classifier uses forward selection aided by a penalty term and can be computed effectively by a technique called the one-pass method. It compares favorably with other classifiers on simulated and real data.</abstract><cop>England</cop><pub>Taylor &amp; Francis</pub><pmid>38863802</pmid><doi>10.1080/02664763.2023.2208773</doi><tpages>27</tpages><orcidid>https://orcid.org/0000-0002-1710-9761</orcidid><orcidid>https://orcid.org/0000-0002-5098-2060</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 0266-4763
ispartof Journal of applied statistics, 2024-06, Vol.51 (8), p.1497-1523
issn 0266-4763
1360-0532
language eng
recordid cdi_proquest_miscellaneous_3067911792
source Taylor and Francis Science and Technology Collection
subjects Classification
Classifiers
genotype
multitype data
phenotype
sparsity
Statistical methods
Variables
title Combining phenotypic and genomic data to improve prediction of binary traits
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-27T07%3A13%3A50IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Combining%20phenotypic%20and%20genomic%20data%20to%20improve%20prediction%20of%20binary%20traits&rft.jtitle=Journal%20of%20applied%20statistics&rft.au=Jarquin,%20D.&rft.date=2024-06-10&rft.volume=51&rft.issue=8&rft.spage=1497&rft.epage=1523&rft.pages=1497-1523&rft.issn=0266-4763&rft.eissn=1360-0532&rft_id=info:doi/10.1080/02664763.2023.2208773&rft_dat=%3Cproquest_cross%3E3065770611%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c497t-5c4e9d7f222699f64c7a972fea5b4eb5c831f729d241bf6b7b96f0b5065c8c3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=3065770611&rft_id=info:pmid/38863802&rfr_iscdi=true