Loading…
Analyzing genome-wide association studies with an FDR controlling modification of the Bayesian information criterion
The prevailing method of analyzing GWAS data is still to test each marker individually, although from a statistical point of view it is quite obvious that in case of complex traits such single marker tests are not ideal. Recently several model selection approaches for GWAS have been suggested, most...
Saved in:
Published in: | arXiv.org 2014-03 |
---|---|
Main Authors: | , , |
Format: | Article |
Language: | English |
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | |
---|---|
cites | |
container_end_page | |
container_issue | |
container_start_page | |
container_title | arXiv.org |
container_volume | |
creator | Dolejsi, Erich Bodenstorfer, Bernhard Frommlet, Florian |
description | The prevailing method of analyzing GWAS data is still to test each marker individually, although from a statistical point of view it is quite obvious that in case of complex traits such single marker tests are not ideal. Recently several model selection approaches for GWAS have been suggested, most of them based on LASSO-type procedures. Here we will discuss an alternative model selection approach which is based on a modification of the Bayesian Information Criterion (mBIC2) which was previously shown to have certain asymptotic optimality properties in terms of minimizing the misclassification error. Heuristic search strategies are introduced which attempt to find the model which minimizes mBIC2, and which are efficient enough to allow the analysis of GWAS data. Our approach is implemented in a software package called MOSGWA. Its performance in case control GWAS is compared with the two algorithms HLASSO and GWASelect, as well as with single marker tests, where we performed a simulation study based on real SNP data from the POPRES sample. Our results show that MOSGWA performs slightly better than HLASSO, whereas according to our simulations GWASelect does not control the type I error when used to automatically determine the number of important SNPs. We also reanalyze the GWAS data from the Wellcome Trust Case-Control Consortium (WTCCC) and compare the findings of the different procedures. |
doi_str_mv | 10.48550/arxiv.1403.6623 |
format | article |
fullrecord | <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2083037676</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2083037676</sourcerecordid><originalsourceid>FETCH-LOGICAL-a516-d7b28587fac7cbe863274c163ca48cafb7174573592bf2569b007f2c3144cefd3</originalsourceid><addsrcrecordid>eNotj99LQjEcxUcQJOZ7j4Oer2377pePZpmBEITvsru76eS61TYz--sz7OkcOJ9z4CB0R8mYayHIg8nf4WtMOYGxlAyu0IAB0EZzxm7QqJQdIYRJxYSAAarTaPrTT4gbvHEx7V1zDJ3DppRkg6khRVzqoQuu4GOoW2winj-9Y5tizanv_3r71AUf7AVOHtetw4_m5Eo4wyH6lPeXzOZQXT67W3TtTV_c6F-HaDV_Xs0WzfLt5XU2XTZGUNl0qmVaaOWNVbZ1WgJT3FIJ1nBtjW8VVVwoEBPWeibkpCVEeWaBcm6d72CI7i-zHzl9Hlyp61065PPfsmZEAwEllYRfrmxenA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2083037676</pqid></control><display><type>article</type><title>Analyzing genome-wide association studies with an FDR controlling modification of the Bayesian information criterion</title><source>ProQuest Publicly Available Content database</source><creator>Dolejsi, Erich ; Bodenstorfer, Bernhard ; Frommlet, Florian</creator><creatorcontrib>Dolejsi, Erich ; Bodenstorfer, Bernhard ; Frommlet, Florian</creatorcontrib><description>The prevailing method of analyzing GWAS data is still to test each marker individually, although from a statistical point of view it is quite obvious that in case of complex traits such single marker tests are not ideal. Recently several model selection approaches for GWAS have been suggested, most of them based on LASSO-type procedures. Here we will discuss an alternative model selection approach which is based on a modification of the Bayesian Information Criterion (mBIC2) which was previously shown to have certain asymptotic optimality properties in terms of minimizing the misclassification error. Heuristic search strategies are introduced which attempt to find the model which minimizes mBIC2, and which are efficient enough to allow the analysis of GWAS data. Our approach is implemented in a software package called MOSGWA. Its performance in case control GWAS is compared with the two algorithms HLASSO and GWASelect, as well as with single marker tests, where we performed a simulation study based on real SNP data from the POPRES sample. Our results show that MOSGWA performs slightly better than HLASSO, whereas according to our simulations GWASelect does not control the type I error when used to automatically determine the number of important SNPs. We also reanalyze the GWAS data from the Wellcome Trust Case-Control Consortium (WTCCC) and compare the findings of the different procedures.</description><identifier>EISSN: 2331-8422</identifier><identifier>DOI: 10.48550/arxiv.1403.6623</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Algorithms ; Asymptotic properties ; Bayesian analysis ; Computer simulation ; Consortia ; Criteria ; Data analysis</subject><ispartof>arXiv.org, 2014-03</ispartof><rights>2014. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/2083037676?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>780,784,25753,27925,37012,44590</link.rule.ids></links><search><creatorcontrib>Dolejsi, Erich</creatorcontrib><creatorcontrib>Bodenstorfer, Bernhard</creatorcontrib><creatorcontrib>Frommlet, Florian</creatorcontrib><title>Analyzing genome-wide association studies with an FDR controlling modification of the Bayesian information criterion</title><title>arXiv.org</title><description>The prevailing method of analyzing GWAS data is still to test each marker individually, although from a statistical point of view it is quite obvious that in case of complex traits such single marker tests are not ideal. Recently several model selection approaches for GWAS have been suggested, most of them based on LASSO-type procedures. Here we will discuss an alternative model selection approach which is based on a modification of the Bayesian Information Criterion (mBIC2) which was previously shown to have certain asymptotic optimality properties in terms of minimizing the misclassification error. Heuristic search strategies are introduced which attempt to find the model which minimizes mBIC2, and which are efficient enough to allow the analysis of GWAS data. Our approach is implemented in a software package called MOSGWA. Its performance in case control GWAS is compared with the two algorithms HLASSO and GWASelect, as well as with single marker tests, where we performed a simulation study based on real SNP data from the POPRES sample. Our results show that MOSGWA performs slightly better than HLASSO, whereas according to our simulations GWASelect does not control the type I error when used to automatically determine the number of important SNPs. We also reanalyze the GWAS data from the Wellcome Trust Case-Control Consortium (WTCCC) and compare the findings of the different procedures.</description><subject>Algorithms</subject><subject>Asymptotic properties</subject><subject>Bayesian analysis</subject><subject>Computer simulation</subject><subject>Consortia</subject><subject>Criteria</subject><subject>Data analysis</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2014</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><recordid>eNotj99LQjEcxUcQJOZ7j4Oer2377pePZpmBEITvsru76eS61TYz--sz7OkcOJ9z4CB0R8mYayHIg8nf4WtMOYGxlAyu0IAB0EZzxm7QqJQdIYRJxYSAAarTaPrTT4gbvHEx7V1zDJ3DppRkg6khRVzqoQuu4GOoW2winj-9Y5tizanv_3r71AUf7AVOHtetw4_m5Eo4wyH6lPeXzOZQXT67W3TtTV_c6F-HaDV_Xs0WzfLt5XU2XTZGUNl0qmVaaOWNVbZ1WgJT3FIJ1nBtjW8VVVwoEBPWeibkpCVEeWaBcm6d72CI7i-zHzl9Hlyp61065PPfsmZEAwEllYRfrmxenA</recordid><startdate>20140326</startdate><enddate>20140326</enddate><creator>Dolejsi, Erich</creator><creator>Bodenstorfer, Bernhard</creator><creator>Frommlet, Florian</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20140326</creationdate><title>Analyzing genome-wide association studies with an FDR controlling modification of the Bayesian information criterion</title><author>Dolejsi, Erich ; Bodenstorfer, Bernhard ; Frommlet, Florian</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a516-d7b28587fac7cbe863274c163ca48cafb7174573592bf2569b007f2c3144cefd3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2014</creationdate><topic>Algorithms</topic><topic>Asymptotic properties</topic><topic>Bayesian analysis</topic><topic>Computer simulation</topic><topic>Consortia</topic><topic>Criteria</topic><topic>Data analysis</topic><toplevel>online_resources</toplevel><creatorcontrib>Dolejsi, Erich</creatorcontrib><creatorcontrib>Bodenstorfer, Bernhard</creatorcontrib><creatorcontrib>Frommlet, Florian</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>AUTh Library subscriptions: ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>ProQuest Publicly Available Content database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection><jtitle>arXiv.org</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Dolejsi, Erich</au><au>Bodenstorfer, Bernhard</au><au>Frommlet, Florian</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Analyzing genome-wide association studies with an FDR controlling modification of the Bayesian information criterion</atitle><jtitle>arXiv.org</jtitle><date>2014-03-26</date><risdate>2014</risdate><eissn>2331-8422</eissn><abstract>The prevailing method of analyzing GWAS data is still to test each marker individually, although from a statistical point of view it is quite obvious that in case of complex traits such single marker tests are not ideal. Recently several model selection approaches for GWAS have been suggested, most of them based on LASSO-type procedures. Here we will discuss an alternative model selection approach which is based on a modification of the Bayesian Information Criterion (mBIC2) which was previously shown to have certain asymptotic optimality properties in terms of minimizing the misclassification error. Heuristic search strategies are introduced which attempt to find the model which minimizes mBIC2, and which are efficient enough to allow the analysis of GWAS data. Our approach is implemented in a software package called MOSGWA. Its performance in case control GWAS is compared with the two algorithms HLASSO and GWASelect, as well as with single marker tests, where we performed a simulation study based on real SNP data from the POPRES sample. Our results show that MOSGWA performs slightly better than HLASSO, whereas according to our simulations GWASelect does not control the type I error when used to automatically determine the number of important SNPs. We also reanalyze the GWAS data from the Wellcome Trust Case-Control Consortium (WTCCC) and compare the findings of the different procedures.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><doi>10.48550/arxiv.1403.6623</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | EISSN: 2331-8422 |
ispartof | arXiv.org, 2014-03 |
issn | 2331-8422 |
language | eng |
recordid | cdi_proquest_journals_2083037676 |
source | ProQuest Publicly Available Content database |
subjects | Algorithms Asymptotic properties Bayesian analysis Computer simulation Consortia Criteria Data analysis |
title | Analyzing genome-wide association studies with an FDR controlling modification of the Bayesian information criterion |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-25T04%3A30%3A16IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Analyzing%20genome-wide%20association%20studies%20with%20an%20FDR%20controlling%20modification%20of%20the%20Bayesian%20information%20criterion&rft.jtitle=arXiv.org&rft.au=Dolejsi,%20Erich&rft.date=2014-03-26&rft.eissn=2331-8422&rft_id=info:doi/10.48550/arxiv.1403.6623&rft_dat=%3Cproquest%3E2083037676%3C/proquest%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-a516-d7b28587fac7cbe863274c163ca48cafb7174573592bf2569b007f2c3144cefd3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2083037676&rft_id=info:pmid/&rfr_iscdi=true |