Loading…
Disentangling genetic feature selection and aggregation in transcriptome-wide association studies
Abstract The success of transcriptome-wide association studies (TWAS) has led to substantial research toward improving the predictive accuracy of its core component of genetically regulated expression (GReX). GReX links expression information with genotype and phenotype by playing two roles simultan...
Saved in:
Published in: | Genetics (Austin) 2022-02, Vol.220 (2) |
---|---|
Main Authors: | , , , , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | cdi_FETCH-LOGICAL-c427t-b2cbff77666a75aa36a9b4d83ae7f60064ae7f064f54b993e6568202e5ff62463 |
---|---|
cites | cdi_FETCH-LOGICAL-c427t-b2cbff77666a75aa36a9b4d83ae7f60064ae7f064f54b993e6568202e5ff62463 |
container_end_page | |
container_issue | 2 |
container_start_page | |
container_title | Genetics (Austin) |
container_volume | 220 |
creator | Cao, Chen Kossinna, Pathum Kwok, Devin Li, Qing He, Jingni Su, Liya Guo, Xingyi Zhang, Qingrun Long, Quan |
description | Abstract
The success of transcriptome-wide association studies (TWAS) has led to substantial research toward improving the predictive accuracy of its core component of genetically regulated expression (GReX). GReX links expression information with genotype and phenotype by playing two roles simultaneously: it acts as both the outcome of the genotype-based predictive models (for predicting expressions) and the linear combination of genotypes (as the predicted expressions) for association tests. From the perspective of machine learning (considering SNPs as features), these are actually two separable steps—feature selection and feature aggregation—which can be independently conducted. In this study, we show that the single approach of GReX limits the adaptability of TWAS methodology and practice. By conducting simulations and real data analysis, we demonstrate that disentangled protocols adapting straightforward approaches for feature selection (e.g., simple marker test) and aggregation (e.g., kernel machines) outperform the standard TWAS protocols that rely on GReX. Our development provides more powerful novel tools for conducting TWAS. More importantly, our characterization of the exact nature of TWAS suggests that, instead of questionably binding two distinct steps into the same statistical form (GReX), methodological research focusing on optimal combinations of feature selection and aggregation approaches will bring higher power to TWAS protocols. |
doi_str_mv | 10.1093/genetics/iyab216 |
format | article |
fullrecord | <record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_9208638</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><oup_id>10.1093/genetics/iyab216</oup_id><sourcerecordid>2638560154</sourcerecordid><originalsourceid>FETCH-LOGICAL-c427t-b2cbff77666a75aa36a9b4d83ae7f60064ae7f064f54b993e6568202e5ff62463</originalsourceid><addsrcrecordid>eNqFkb1PwzAQxS0EoqWwM6FIjCjUiR0nWZBQ-ZQqscBsXZJzMGrtYjsg_nsMbRFMTHeWf_fu2Y-Q44yeZ7Rm0x4NBt36qf6AJs_EDhlnNWdpLli2-6sfkQPvXyiloi6qfTJivOJ1VZRjAlfaowlg-oU2fbIRTBRCGBwmHhfYBm1NAqZLoO8d9vB91iYJDoxvnV4Fu8T0XXeYgPe21WvCh6HT6A_JnoKFx6NNnZCnm-vH2V06f7i9n13O05bnZUibvG2UKkshBJQFABNQN7yrGGCpRHTOv5pYVMGbumYoClHlNMdCKZFzwSbkYq27Gpoldm18lYOFXDm9BPchLWj598boZ9nbN1nntBKsigKnGwFnXwf0Qb7YwZnoWcY_rApBs4JHiq6p1lnvHaqfDRmVX6HIbShyE0ocOfnt7Gdgm0IEztaAHVb_y30C84CeBw</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2638560154</pqid></control><display><type>article</type><title>Disentangling genetic feature selection and aggregation in transcriptome-wide association studies</title><source>Freely Accessible Science Journals - check A-Z of ejournals</source><source>Oxford Journals Online</source><source>Alma/SFX Local Collection</source><creator>Cao, Chen ; Kossinna, Pathum ; Kwok, Devin ; Li, Qing ; He, Jingni ; Su, Liya ; Guo, Xingyi ; Zhang, Qingrun ; Long, Quan</creator><contributor>Li, Y</contributor><creatorcontrib>Cao, Chen ; Kossinna, Pathum ; Kwok, Devin ; Li, Qing ; He, Jingni ; Su, Liya ; Guo, Xingyi ; Zhang, Qingrun ; Long, Quan ; Li, Y</creatorcontrib><description>Abstract
The success of transcriptome-wide association studies (TWAS) has led to substantial research toward improving the predictive accuracy of its core component of genetically regulated expression (GReX). GReX links expression information with genotype and phenotype by playing two roles simultaneously: it acts as both the outcome of the genotype-based predictive models (for predicting expressions) and the linear combination of genotypes (as the predicted expressions) for association tests. From the perspective of machine learning (considering SNPs as features), these are actually two separable steps—feature selection and feature aggregation—which can be independently conducted. In this study, we show that the single approach of GReX limits the adaptability of TWAS methodology and practice. By conducting simulations and real data analysis, we demonstrate that disentangled protocols adapting straightforward approaches for feature selection (e.g., simple marker test) and aggregation (e.g., kernel machines) outperform the standard TWAS protocols that rely on GReX. Our development provides more powerful novel tools for conducting TWAS. More importantly, our characterization of the exact nature of TWAS suggests that, instead of questionably binding two distinct steps into the same statistical form (GReX), methodological research focusing on optimal combinations of feature selection and aggregation approaches will bring higher power to TWAS protocols.</description><identifier>ISSN: 1943-2631</identifier><identifier>ISSN: 0016-6731</identifier><identifier>EISSN: 1943-2631</identifier><identifier>DOI: 10.1093/genetics/iyab216</identifier><identifier>PMID: 34849857</identifier><language>eng</language><publisher>United States: Oxford University Press</publisher><subject>Adaptability ; Agglomeration ; Data analysis ; Feature selection ; Genetics ; Genotypes ; Investigation ; Kernel functions ; Machine learning ; Phenotypes ; Prediction models ; Single-nucleotide polymorphism ; Statistical power ; Transcriptomes</subject><ispartof>Genetics (Austin), 2022-02, Vol.220 (2)</ispartof><rights>The Author(s) 2021. Published by Oxford University Press on behalf of Genetics Society of America. All rights reserved. For permissions, please email: journals.permissions@oup.com 2021</rights><rights>The Author(s) 2021. Published by Oxford University Press on behalf of Genetics Society of America. All rights reserved. For permissions, please email: journals.permissions@oup.com.</rights><rights>The Author(s) 2021. Published by Oxford University Press on behalf of Genetics Society of America. All rights reserved. For permissions, please email: journals.permissions@oup.com</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c427t-b2cbff77666a75aa36a9b4d83ae7f60064ae7f064f54b993e6568202e5ff62463</citedby><cites>FETCH-LOGICAL-c427t-b2cbff77666a75aa36a9b4d83ae7f60064ae7f064f54b993e6568202e5ff62463</cites><orcidid>0000-0003-3795-1873 ; 0000-0001-5269-1294 ; 0000-0001-5343-808X</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>230,314,776,780,881,27903,27904</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/34849857$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><contributor>Li, Y</contributor><creatorcontrib>Cao, Chen</creatorcontrib><creatorcontrib>Kossinna, Pathum</creatorcontrib><creatorcontrib>Kwok, Devin</creatorcontrib><creatorcontrib>Li, Qing</creatorcontrib><creatorcontrib>He, Jingni</creatorcontrib><creatorcontrib>Su, Liya</creatorcontrib><creatorcontrib>Guo, Xingyi</creatorcontrib><creatorcontrib>Zhang, Qingrun</creatorcontrib><creatorcontrib>Long, Quan</creatorcontrib><title>Disentangling genetic feature selection and aggregation in transcriptome-wide association studies</title><title>Genetics (Austin)</title><addtitle>Genetics</addtitle><description>Abstract
The success of transcriptome-wide association studies (TWAS) has led to substantial research toward improving the predictive accuracy of its core component of genetically regulated expression (GReX). GReX links expression information with genotype and phenotype by playing two roles simultaneously: it acts as both the outcome of the genotype-based predictive models (for predicting expressions) and the linear combination of genotypes (as the predicted expressions) for association tests. From the perspective of machine learning (considering SNPs as features), these are actually two separable steps—feature selection and feature aggregation—which can be independently conducted. In this study, we show that the single approach of GReX limits the adaptability of TWAS methodology and practice. By conducting simulations and real data analysis, we demonstrate that disentangled protocols adapting straightforward approaches for feature selection (e.g., simple marker test) and aggregation (e.g., kernel machines) outperform the standard TWAS protocols that rely on GReX. Our development provides more powerful novel tools for conducting TWAS. More importantly, our characterization of the exact nature of TWAS suggests that, instead of questionably binding two distinct steps into the same statistical form (GReX), methodological research focusing on optimal combinations of feature selection and aggregation approaches will bring higher power to TWAS protocols.</description><subject>Adaptability</subject><subject>Agglomeration</subject><subject>Data analysis</subject><subject>Feature selection</subject><subject>Genetics</subject><subject>Genotypes</subject><subject>Investigation</subject><subject>Kernel functions</subject><subject>Machine learning</subject><subject>Phenotypes</subject><subject>Prediction models</subject><subject>Single-nucleotide polymorphism</subject><subject>Statistical power</subject><subject>Transcriptomes</subject><issn>1943-2631</issn><issn>0016-6731</issn><issn>1943-2631</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><recordid>eNqFkb1PwzAQxS0EoqWwM6FIjCjUiR0nWZBQ-ZQqscBsXZJzMGrtYjsg_nsMbRFMTHeWf_fu2Y-Q44yeZ7Rm0x4NBt36qf6AJs_EDhlnNWdpLli2-6sfkQPvXyiloi6qfTJivOJ1VZRjAlfaowlg-oU2fbIRTBRCGBwmHhfYBm1NAqZLoO8d9vB91iYJDoxvnV4Fu8T0XXeYgPe21WvCh6HT6A_JnoKFx6NNnZCnm-vH2V06f7i9n13O05bnZUibvG2UKkshBJQFABNQN7yrGGCpRHTOv5pYVMGbumYoClHlNMdCKZFzwSbkYq27Gpoldm18lYOFXDm9BPchLWj598boZ9nbN1nntBKsigKnGwFnXwf0Qb7YwZnoWcY_rApBs4JHiq6p1lnvHaqfDRmVX6HIbShyE0ocOfnt7Gdgm0IEztaAHVb_y30C84CeBw</recordid><startdate>20220204</startdate><enddate>20220204</enddate><creator>Cao, Chen</creator><creator>Kossinna, Pathum</creator><creator>Kwok, Devin</creator><creator>Li, Qing</creator><creator>He, Jingni</creator><creator>Su, Liya</creator><creator>Guo, Xingyi</creator><creator>Zhang, Qingrun</creator><creator>Long, Quan</creator><general>Oxford University Press</general><general>Genetics Society of America</general><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>4T-</scope><scope>4U-</scope><scope>7QP</scope><scope>7SS</scope><scope>7TK</scope><scope>7TM</scope><scope>8FD</scope><scope>FR3</scope><scope>K9.</scope><scope>M7N</scope><scope>P64</scope><scope>RC3</scope><scope>5PM</scope><orcidid>https://orcid.org/0000-0003-3795-1873</orcidid><orcidid>https://orcid.org/0000-0001-5269-1294</orcidid><orcidid>https://orcid.org/0000-0001-5343-808X</orcidid></search><sort><creationdate>20220204</creationdate><title>Disentangling genetic feature selection and aggregation in transcriptome-wide association studies</title><author>Cao, Chen ; Kossinna, Pathum ; Kwok, Devin ; Li, Qing ; He, Jingni ; Su, Liya ; Guo, Xingyi ; Zhang, Qingrun ; Long, Quan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c427t-b2cbff77666a75aa36a9b4d83ae7f60064ae7f064f54b993e6568202e5ff62463</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Adaptability</topic><topic>Agglomeration</topic><topic>Data analysis</topic><topic>Feature selection</topic><topic>Genetics</topic><topic>Genotypes</topic><topic>Investigation</topic><topic>Kernel functions</topic><topic>Machine learning</topic><topic>Phenotypes</topic><topic>Prediction models</topic><topic>Single-nucleotide polymorphism</topic><topic>Statistical power</topic><topic>Transcriptomes</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Cao, Chen</creatorcontrib><creatorcontrib>Kossinna, Pathum</creatorcontrib><creatorcontrib>Kwok, Devin</creatorcontrib><creatorcontrib>Li, Qing</creatorcontrib><creatorcontrib>He, Jingni</creatorcontrib><creatorcontrib>Su, Liya</creatorcontrib><creatorcontrib>Guo, Xingyi</creatorcontrib><creatorcontrib>Zhang, Qingrun</creatorcontrib><creatorcontrib>Long, Quan</creatorcontrib><collection>PubMed</collection><collection>CrossRef</collection><collection>Docstoc</collection><collection>University Readers</collection><collection>Calcium & Calcified Tissue Abstracts</collection><collection>Entomology Abstracts (Full archive)</collection><collection>Neurosciences Abstracts</collection><collection>Nucleic Acids Abstracts</collection><collection>Technology Research Database</collection><collection>Engineering Research Database</collection><collection>ProQuest Health & Medical Complete (Alumni)</collection><collection>Algology Mycology and Protozoology Abstracts (Microbiology C)</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Genetics Abstracts</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Genetics (Austin)</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Cao, Chen</au><au>Kossinna, Pathum</au><au>Kwok, Devin</au><au>Li, Qing</au><au>He, Jingni</au><au>Su, Liya</au><au>Guo, Xingyi</au><au>Zhang, Qingrun</au><au>Long, Quan</au><au>Li, Y</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Disentangling genetic feature selection and aggregation in transcriptome-wide association studies</atitle><jtitle>Genetics (Austin)</jtitle><addtitle>Genetics</addtitle><date>2022-02-04</date><risdate>2022</risdate><volume>220</volume><issue>2</issue><issn>1943-2631</issn><issn>0016-6731</issn><eissn>1943-2631</eissn><abstract>Abstract
The success of transcriptome-wide association studies (TWAS) has led to substantial research toward improving the predictive accuracy of its core component of genetically regulated expression (GReX). GReX links expression information with genotype and phenotype by playing two roles simultaneously: it acts as both the outcome of the genotype-based predictive models (for predicting expressions) and the linear combination of genotypes (as the predicted expressions) for association tests. From the perspective of machine learning (considering SNPs as features), these are actually two separable steps—feature selection and feature aggregation—which can be independently conducted. In this study, we show that the single approach of GReX limits the adaptability of TWAS methodology and practice. By conducting simulations and real data analysis, we demonstrate that disentangled protocols adapting straightforward approaches for feature selection (e.g., simple marker test) and aggregation (e.g., kernel machines) outperform the standard TWAS protocols that rely on GReX. Our development provides more powerful novel tools for conducting TWAS. More importantly, our characterization of the exact nature of TWAS suggests that, instead of questionably binding two distinct steps into the same statistical form (GReX), methodological research focusing on optimal combinations of feature selection and aggregation approaches will bring higher power to TWAS protocols.</abstract><cop>United States</cop><pub>Oxford University Press</pub><pmid>34849857</pmid><doi>10.1093/genetics/iyab216</doi><orcidid>https://orcid.org/0000-0003-3795-1873</orcidid><orcidid>https://orcid.org/0000-0001-5269-1294</orcidid><orcidid>https://orcid.org/0000-0001-5343-808X</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1943-2631 |
ispartof | Genetics (Austin), 2022-02, Vol.220 (2) |
issn | 1943-2631 0016-6731 1943-2631 |
language | eng |
recordid | cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_9208638 |
source | Freely Accessible Science Journals - check A-Z of ejournals; Oxford Journals Online; Alma/SFX Local Collection |
subjects | Adaptability Agglomeration Data analysis Feature selection Genetics Genotypes Investigation Kernel functions Machine learning Phenotypes Prediction models Single-nucleotide polymorphism Statistical power Transcriptomes |
title | Disentangling genetic feature selection and aggregation in transcriptome-wide association studies |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-24T04%3A29%3A04IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Disentangling%20genetic%20feature%20selection%20and%20aggregation%20in%20transcriptome-wide%20association%20studies&rft.jtitle=Genetics%20(Austin)&rft.au=Cao,%20Chen&rft.date=2022-02-04&rft.volume=220&rft.issue=2&rft.issn=1943-2631&rft.eissn=1943-2631&rft_id=info:doi/10.1093/genetics/iyab216&rft_dat=%3Cproquest_pubme%3E2638560154%3C/proquest_pubme%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c427t-b2cbff77666a75aa36a9b4d83ae7f60064ae7f064f54b993e6568202e5ff62463%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2638560154&rft_id=info:pmid/34849857&rft_oup_id=10.1093/genetics/iyab216&rfr_iscdi=true |