Loading…

Disentangling genetic feature selection and aggregation in transcriptome-wide association studies

Abstract The success of transcriptome-wide association studies (TWAS) has led to substantial research toward improving the predictive accuracy of its core component of genetically regulated expression (GReX). GReX links expression information with genotype and phenotype by playing two roles simultan...

Full description

Saved in:
Bibliographic Details
Published in:Genetics (Austin) 2022-02, Vol.220 (2)
Main Authors: Cao, Chen, Kossinna, Pathum, Kwok, Devin, Li, Qing, He, Jingni, Su, Liya, Guo, Xingyi, Zhang, Qingrun, Long, Quan
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c427t-b2cbff77666a75aa36a9b4d83ae7f60064ae7f064f54b993e6568202e5ff62463
cites cdi_FETCH-LOGICAL-c427t-b2cbff77666a75aa36a9b4d83ae7f60064ae7f064f54b993e6568202e5ff62463
container_end_page
container_issue 2
container_start_page
container_title Genetics (Austin)
container_volume 220
creator Cao, Chen
Kossinna, Pathum
Kwok, Devin
Li, Qing
He, Jingni
Su, Liya
Guo, Xingyi
Zhang, Qingrun
Long, Quan
description Abstract The success of transcriptome-wide association studies (TWAS) has led to substantial research toward improving the predictive accuracy of its core component of genetically regulated expression (GReX). GReX links expression information with genotype and phenotype by playing two roles simultaneously: it acts as both the outcome of the genotype-based predictive models (for predicting expressions) and the linear combination of genotypes (as the predicted expressions) for association tests. From the perspective of machine learning (considering SNPs as features), these are actually two separable steps—feature selection and feature aggregation—which can be independently conducted. In this study, we show that the single approach of GReX limits the adaptability of TWAS methodology and practice. By conducting simulations and real data analysis, we demonstrate that disentangled protocols adapting straightforward approaches for feature selection (e.g., simple marker test) and aggregation (e.g., kernel machines) outperform the standard TWAS protocols that rely on GReX. Our development provides more powerful novel tools for conducting TWAS. More importantly, our characterization of the exact nature of TWAS suggests that, instead of questionably binding two distinct steps into the same statistical form (GReX), methodological research focusing on optimal combinations of feature selection and aggregation approaches will bring higher power to TWAS protocols.
doi_str_mv 10.1093/genetics/iyab216
format article
fullrecord <record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_9208638</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><oup_id>10.1093/genetics/iyab216</oup_id><sourcerecordid>2638560154</sourcerecordid><originalsourceid>FETCH-LOGICAL-c427t-b2cbff77666a75aa36a9b4d83ae7f60064ae7f064f54b993e6568202e5ff62463</originalsourceid><addsrcrecordid>eNqFkb1PwzAQxS0EoqWwM6FIjCjUiR0nWZBQ-ZQqscBsXZJzMGrtYjsg_nsMbRFMTHeWf_fu2Y-Q44yeZ7Rm0x4NBt36qf6AJs_EDhlnNWdpLli2-6sfkQPvXyiloi6qfTJivOJ1VZRjAlfaowlg-oU2fbIRTBRCGBwmHhfYBm1NAqZLoO8d9vB91iYJDoxvnV4Fu8T0XXeYgPe21WvCh6HT6A_JnoKFx6NNnZCnm-vH2V06f7i9n13O05bnZUibvG2UKkshBJQFABNQN7yrGGCpRHTOv5pYVMGbumYoClHlNMdCKZFzwSbkYq27Gpoldm18lYOFXDm9BPchLWj598boZ9nbN1nntBKsigKnGwFnXwf0Qb7YwZnoWcY_rApBs4JHiq6p1lnvHaqfDRmVX6HIbShyE0ocOfnt7Gdgm0IEztaAHVb_y30C84CeBw</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2638560154</pqid></control><display><type>article</type><title>Disentangling genetic feature selection and aggregation in transcriptome-wide association studies</title><source>Freely Accessible Science Journals - check A-Z of ejournals</source><source>Oxford Journals Online</source><source>Alma/SFX Local Collection</source><creator>Cao, Chen ; Kossinna, Pathum ; Kwok, Devin ; Li, Qing ; He, Jingni ; Su, Liya ; Guo, Xingyi ; Zhang, Qingrun ; Long, Quan</creator><contributor>Li, Y</contributor><creatorcontrib>Cao, Chen ; Kossinna, Pathum ; Kwok, Devin ; Li, Qing ; He, Jingni ; Su, Liya ; Guo, Xingyi ; Zhang, Qingrun ; Long, Quan ; Li, Y</creatorcontrib><description>Abstract The success of transcriptome-wide association studies (TWAS) has led to substantial research toward improving the predictive accuracy of its core component of genetically regulated expression (GReX). GReX links expression information with genotype and phenotype by playing two roles simultaneously: it acts as both the outcome of the genotype-based predictive models (for predicting expressions) and the linear combination of genotypes (as the predicted expressions) for association tests. From the perspective of machine learning (considering SNPs as features), these are actually two separable steps—feature selection and feature aggregation—which can be independently conducted. In this study, we show that the single approach of GReX limits the adaptability of TWAS methodology and practice. By conducting simulations and real data analysis, we demonstrate that disentangled protocols adapting straightforward approaches for feature selection (e.g., simple marker test) and aggregation (e.g., kernel machines) outperform the standard TWAS protocols that rely on GReX. Our development provides more powerful novel tools for conducting TWAS. More importantly, our characterization of the exact nature of TWAS suggests that, instead of questionably binding two distinct steps into the same statistical form (GReX), methodological research focusing on optimal combinations of feature selection and aggregation approaches will bring higher power to TWAS protocols.</description><identifier>ISSN: 1943-2631</identifier><identifier>ISSN: 0016-6731</identifier><identifier>EISSN: 1943-2631</identifier><identifier>DOI: 10.1093/genetics/iyab216</identifier><identifier>PMID: 34849857</identifier><language>eng</language><publisher>United States: Oxford University Press</publisher><subject>Adaptability ; Agglomeration ; Data analysis ; Feature selection ; Genetics ; Genotypes ; Investigation ; Kernel functions ; Machine learning ; Phenotypes ; Prediction models ; Single-nucleotide polymorphism ; Statistical power ; Transcriptomes</subject><ispartof>Genetics (Austin), 2022-02, Vol.220 (2)</ispartof><rights>The Author(s) 2021. Published by Oxford University Press on behalf of Genetics Society of America. All rights reserved. For permissions, please email: journals.permissions@oup.com 2021</rights><rights>The Author(s) 2021. Published by Oxford University Press on behalf of Genetics Society of America. All rights reserved. For permissions, please email: journals.permissions@oup.com.</rights><rights>The Author(s) 2021. Published by Oxford University Press on behalf of Genetics Society of America. All rights reserved. For permissions, please email: journals.permissions@oup.com</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c427t-b2cbff77666a75aa36a9b4d83ae7f60064ae7f064f54b993e6568202e5ff62463</citedby><cites>FETCH-LOGICAL-c427t-b2cbff77666a75aa36a9b4d83ae7f60064ae7f064f54b993e6568202e5ff62463</cites><orcidid>0000-0003-3795-1873 ; 0000-0001-5269-1294 ; 0000-0001-5343-808X</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>230,314,776,780,881,27903,27904</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/34849857$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><contributor>Li, Y</contributor><creatorcontrib>Cao, Chen</creatorcontrib><creatorcontrib>Kossinna, Pathum</creatorcontrib><creatorcontrib>Kwok, Devin</creatorcontrib><creatorcontrib>Li, Qing</creatorcontrib><creatorcontrib>He, Jingni</creatorcontrib><creatorcontrib>Su, Liya</creatorcontrib><creatorcontrib>Guo, Xingyi</creatorcontrib><creatorcontrib>Zhang, Qingrun</creatorcontrib><creatorcontrib>Long, Quan</creatorcontrib><title>Disentangling genetic feature selection and aggregation in transcriptome-wide association studies</title><title>Genetics (Austin)</title><addtitle>Genetics</addtitle><description>Abstract The success of transcriptome-wide association studies (TWAS) has led to substantial research toward improving the predictive accuracy of its core component of genetically regulated expression (GReX). GReX links expression information with genotype and phenotype by playing two roles simultaneously: it acts as both the outcome of the genotype-based predictive models (for predicting expressions) and the linear combination of genotypes (as the predicted expressions) for association tests. From the perspective of machine learning (considering SNPs as features), these are actually two separable steps—feature selection and feature aggregation—which can be independently conducted. In this study, we show that the single approach of GReX limits the adaptability of TWAS methodology and practice. By conducting simulations and real data analysis, we demonstrate that disentangled protocols adapting straightforward approaches for feature selection (e.g., simple marker test) and aggregation (e.g., kernel machines) outperform the standard TWAS protocols that rely on GReX. Our development provides more powerful novel tools for conducting TWAS. More importantly, our characterization of the exact nature of TWAS suggests that, instead of questionably binding two distinct steps into the same statistical form (GReX), methodological research focusing on optimal combinations of feature selection and aggregation approaches will bring higher power to TWAS protocols.</description><subject>Adaptability</subject><subject>Agglomeration</subject><subject>Data analysis</subject><subject>Feature selection</subject><subject>Genetics</subject><subject>Genotypes</subject><subject>Investigation</subject><subject>Kernel functions</subject><subject>Machine learning</subject><subject>Phenotypes</subject><subject>Prediction models</subject><subject>Single-nucleotide polymorphism</subject><subject>Statistical power</subject><subject>Transcriptomes</subject><issn>1943-2631</issn><issn>0016-6731</issn><issn>1943-2631</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><recordid>eNqFkb1PwzAQxS0EoqWwM6FIjCjUiR0nWZBQ-ZQqscBsXZJzMGrtYjsg_nsMbRFMTHeWf_fu2Y-Q44yeZ7Rm0x4NBt36qf6AJs_EDhlnNWdpLli2-6sfkQPvXyiloi6qfTJivOJ1VZRjAlfaowlg-oU2fbIRTBRCGBwmHhfYBm1NAqZLoO8d9vB91iYJDoxvnV4Fu8T0XXeYgPe21WvCh6HT6A_JnoKFx6NNnZCnm-vH2V06f7i9n13O05bnZUibvG2UKkshBJQFABNQN7yrGGCpRHTOv5pYVMGbumYoClHlNMdCKZFzwSbkYq27Gpoldm18lYOFXDm9BPchLWj598boZ9nbN1nntBKsigKnGwFnXwf0Qb7YwZnoWcY_rApBs4JHiq6p1lnvHaqfDRmVX6HIbShyE0ocOfnt7Gdgm0IEztaAHVb_y30C84CeBw</recordid><startdate>20220204</startdate><enddate>20220204</enddate><creator>Cao, Chen</creator><creator>Kossinna, Pathum</creator><creator>Kwok, Devin</creator><creator>Li, Qing</creator><creator>He, Jingni</creator><creator>Su, Liya</creator><creator>Guo, Xingyi</creator><creator>Zhang, Qingrun</creator><creator>Long, Quan</creator><general>Oxford University Press</general><general>Genetics Society of America</general><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>4T-</scope><scope>4U-</scope><scope>7QP</scope><scope>7SS</scope><scope>7TK</scope><scope>7TM</scope><scope>8FD</scope><scope>FR3</scope><scope>K9.</scope><scope>M7N</scope><scope>P64</scope><scope>RC3</scope><scope>5PM</scope><orcidid>https://orcid.org/0000-0003-3795-1873</orcidid><orcidid>https://orcid.org/0000-0001-5269-1294</orcidid><orcidid>https://orcid.org/0000-0001-5343-808X</orcidid></search><sort><creationdate>20220204</creationdate><title>Disentangling genetic feature selection and aggregation in transcriptome-wide association studies</title><author>Cao, Chen ; Kossinna, Pathum ; Kwok, Devin ; Li, Qing ; He, Jingni ; Su, Liya ; Guo, Xingyi ; Zhang, Qingrun ; Long, Quan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c427t-b2cbff77666a75aa36a9b4d83ae7f60064ae7f064f54b993e6568202e5ff62463</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Adaptability</topic><topic>Agglomeration</topic><topic>Data analysis</topic><topic>Feature selection</topic><topic>Genetics</topic><topic>Genotypes</topic><topic>Investigation</topic><topic>Kernel functions</topic><topic>Machine learning</topic><topic>Phenotypes</topic><topic>Prediction models</topic><topic>Single-nucleotide polymorphism</topic><topic>Statistical power</topic><topic>Transcriptomes</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Cao, Chen</creatorcontrib><creatorcontrib>Kossinna, Pathum</creatorcontrib><creatorcontrib>Kwok, Devin</creatorcontrib><creatorcontrib>Li, Qing</creatorcontrib><creatorcontrib>He, Jingni</creatorcontrib><creatorcontrib>Su, Liya</creatorcontrib><creatorcontrib>Guo, Xingyi</creatorcontrib><creatorcontrib>Zhang, Qingrun</creatorcontrib><creatorcontrib>Long, Quan</creatorcontrib><collection>PubMed</collection><collection>CrossRef</collection><collection>Docstoc</collection><collection>University Readers</collection><collection>Calcium &amp; Calcified Tissue Abstracts</collection><collection>Entomology Abstracts (Full archive)</collection><collection>Neurosciences Abstracts</collection><collection>Nucleic Acids Abstracts</collection><collection>Technology Research Database</collection><collection>Engineering Research Database</collection><collection>ProQuest Health &amp; Medical Complete (Alumni)</collection><collection>Algology Mycology and Protozoology Abstracts (Microbiology C)</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Genetics Abstracts</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Genetics (Austin)</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Cao, Chen</au><au>Kossinna, Pathum</au><au>Kwok, Devin</au><au>Li, Qing</au><au>He, Jingni</au><au>Su, Liya</au><au>Guo, Xingyi</au><au>Zhang, Qingrun</au><au>Long, Quan</au><au>Li, Y</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Disentangling genetic feature selection and aggregation in transcriptome-wide association studies</atitle><jtitle>Genetics (Austin)</jtitle><addtitle>Genetics</addtitle><date>2022-02-04</date><risdate>2022</risdate><volume>220</volume><issue>2</issue><issn>1943-2631</issn><issn>0016-6731</issn><eissn>1943-2631</eissn><abstract>Abstract The success of transcriptome-wide association studies (TWAS) has led to substantial research toward improving the predictive accuracy of its core component of genetically regulated expression (GReX). GReX links expression information with genotype and phenotype by playing two roles simultaneously: it acts as both the outcome of the genotype-based predictive models (for predicting expressions) and the linear combination of genotypes (as the predicted expressions) for association tests. From the perspective of machine learning (considering SNPs as features), these are actually two separable steps—feature selection and feature aggregation—which can be independently conducted. In this study, we show that the single approach of GReX limits the adaptability of TWAS methodology and practice. By conducting simulations and real data analysis, we demonstrate that disentangled protocols adapting straightforward approaches for feature selection (e.g., simple marker test) and aggregation (e.g., kernel machines) outperform the standard TWAS protocols that rely on GReX. Our development provides more powerful novel tools for conducting TWAS. More importantly, our characterization of the exact nature of TWAS suggests that, instead of questionably binding two distinct steps into the same statistical form (GReX), methodological research focusing on optimal combinations of feature selection and aggregation approaches will bring higher power to TWAS protocols.</abstract><cop>United States</cop><pub>Oxford University Press</pub><pmid>34849857</pmid><doi>10.1093/genetics/iyab216</doi><orcidid>https://orcid.org/0000-0003-3795-1873</orcidid><orcidid>https://orcid.org/0000-0001-5269-1294</orcidid><orcidid>https://orcid.org/0000-0001-5343-808X</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1943-2631
ispartof Genetics (Austin), 2022-02, Vol.220 (2)
issn 1943-2631
0016-6731
1943-2631
language eng
recordid cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_9208638
source Freely Accessible Science Journals - check A-Z of ejournals; Oxford Journals Online; Alma/SFX Local Collection
subjects Adaptability
Agglomeration
Data analysis
Feature selection
Genetics
Genotypes
Investigation
Kernel functions
Machine learning
Phenotypes
Prediction models
Single-nucleotide polymorphism
Statistical power
Transcriptomes
title Disentangling genetic feature selection and aggregation in transcriptome-wide association studies
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-24T04%3A29%3A04IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Disentangling%20genetic%20feature%20selection%20and%20aggregation%20in%20transcriptome-wide%20association%20studies&rft.jtitle=Genetics%20(Austin)&rft.au=Cao,%20Chen&rft.date=2022-02-04&rft.volume=220&rft.issue=2&rft.issn=1943-2631&rft.eissn=1943-2631&rft_id=info:doi/10.1093/genetics/iyab216&rft_dat=%3Cproquest_pubme%3E2638560154%3C/proquest_pubme%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c427t-b2cbff77666a75aa36a9b4d83ae7f60064ae7f064f54b993e6568202e5ff62463%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2638560154&rft_id=info:pmid/34849857&rft_oup_id=10.1093/genetics/iyab216&rfr_iscdi=true