Loading…
Priors, population sizes, and power in genome-wide hypothesis tests
Genome-wide tests, including genome-wide association studies (GWAS) of germ-line genetic variants, driver tests of cancer somatic mutations, and transcriptome-wide association tests of RNAseq data, carry a high multiple testing burden. This burden can be overcome by enrolling larger cohorts or allev...
Saved in:
Published in: | BMC bioinformatics 2023-04, Vol.24 (1), p.170-170, Article 170 |
---|---|
Main Authors: | , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | |
---|---|
cites | cdi_FETCH-LOGICAL-c549t-288f1ee6cc92ca3581295af37b0d4ed9650663683fc2f3ac9fe963c756db59e03 |
container_end_page | 170 |
container_issue | 1 |
container_start_page | 170 |
container_title | BMC bioinformatics |
container_volume | 24 |
creator | Cai, Jitong Zhan, Jianan Arking, Dan E Bader, Joel S |
description | Genome-wide tests, including genome-wide association studies (GWAS) of germ-line genetic variants, driver tests of cancer somatic mutations, and transcriptome-wide association tests of RNAseq data, carry a high multiple testing burden. This burden can be overcome by enrolling larger cohorts or alleviated by using prior biological knowledge to favor some hypotheses over others. Here we compare these two methods in terms of their abilities to boost the power of hypothesis testing.
We provide a quantitative estimate for progress in cohort sizes and present a theoretical analysis of the power of oracular hard priors: priors that select a subset of hypotheses for testing, with an oracular guarantee that all true positives are within the tested subset. This theory demonstrates that for GWAS, strong priors that limit testing to 100-1000 genes provide less power than typical annual 20-40% increases in cohort sizes. Furthermore, non-oracular priors that exclude even a small fraction of true positives from the tested set can perform worse than not using a prior at all.
Our results provide a theoretical explanation for the continued dominance of simple, unbiased univariate hypothesis tests for GWAS: if a statistical question can be answered by larger cohort sizes, it should be answered by larger cohort sizes rather than by more complicated biased methods involving priors. We suggest that priors are better suited for non-statistical aspects of biology, such as pathway structure and causality, that are not yet easily captured by standard hypothesis tests. |
doi_str_mv | 10.1186/s12859-023-05261-9 |
format | article |
fullrecord | <record><control><sourceid>gale_doaj_</sourceid><recordid>TN_cdi_doaj_primary_oai_doaj_org_article_8803d157232c4fce8094456077bf7473</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A747170544</galeid><doaj_id>oai_doaj_org_article_8803d157232c4fce8094456077bf7473</doaj_id><sourcerecordid>A747170544</sourcerecordid><originalsourceid>FETCH-LOGICAL-c549t-288f1ee6cc92ca3581295af37b0d4ed9650663683fc2f3ac9fe963c756db59e03</originalsourceid><addsrcrecordid>eNptkktv1DAUhSMEoqXwB1igSGxAIsWP-LVC1YjCSJVAPNaWx7nOeJTEwU4o5dfX0ymlQcgLW9ffPdc-OkXxHKNTjCV_mzCRTFWI0AoxwnGlHhTHuBa4Ihixh_fOR8WTlHYIYSERe1wcUYERxgQdF6vP0YeY3pRjGOfOTD4MZfK_IVfM0OTqJcTSD2ULQ-ihuvQNlNurMUxbSD6VE6QpPS0eOdMleHa7nxTfz99_W32sLj59WK_OLirLajVVREqHAbi1ilhDmcREMeOo2KCmhkZxhjinXFJniaPGKgeKUysYbzZMAaInxfqg2wSz02P0vYlXOhivbwohttrEydsOtJSINpgJQomtnQWJVF0zjoTYOFELmrXeHbTGedNDY2GYoukWosubwW91G37qbBytOVFZ4dWtQgw_5uyD7n2y0HVmgDAnTSTiSlGC98Ne_oPuwhyH7FWmMGN1jTn7S7Um_8APLuTBdi-qz_KbsUAZzNTpf6i8Gui9DQM4n-uLhteLhsxM8GtqzZySXn_9smTJgbUxpBTB3RmCkd5nTh8yp3Pm9E3m9N6IF_etvGv5EzJ6DThXzfE</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2815544165</pqid></control><display><type>article</type><title>Priors, population sizes, and power in genome-wide hypothesis tests</title><source>Publicly Available Content Database</source><source>PubMed Central</source><creator>Cai, Jitong ; Zhan, Jianan ; Arking, Dan E ; Bader, Joel S</creator><creatorcontrib>Cai, Jitong ; Zhan, Jianan ; Arking, Dan E ; Bader, Joel S</creatorcontrib><description>Genome-wide tests, including genome-wide association studies (GWAS) of germ-line genetic variants, driver tests of cancer somatic mutations, and transcriptome-wide association tests of RNAseq data, carry a high multiple testing burden. This burden can be overcome by enrolling larger cohorts or alleviated by using prior biological knowledge to favor some hypotheses over others. Here we compare these two methods in terms of their abilities to boost the power of hypothesis testing.
We provide a quantitative estimate for progress in cohort sizes and present a theoretical analysis of the power of oracular hard priors: priors that select a subset of hypotheses for testing, with an oracular guarantee that all true positives are within the tested subset. This theory demonstrates that for GWAS, strong priors that limit testing to 100-1000 genes provide less power than typical annual 20-40% increases in cohort sizes. Furthermore, non-oracular priors that exclude even a small fraction of true positives from the tested set can perform worse than not using a prior at all.
Our results provide a theoretical explanation for the continued dominance of simple, unbiased univariate hypothesis tests for GWAS: if a statistical question can be answered by larger cohort sizes, it should be answered by larger cohort sizes rather than by more complicated biased methods involving priors. We suggest that priors are better suited for non-statistical aspects of biology, such as pathway structure and causality, that are not yet easily captured by standard hypothesis tests.</description><identifier>ISSN: 1471-2105</identifier><identifier>EISSN: 1471-2105</identifier><identifier>DOI: 10.1186/s12859-023-05261-9</identifier><identifier>PMID: 37101120</identifier><language>eng</language><publisher>England: BioMed Central Ltd</publisher><subject>Analysis ; Breast cancer ; Gene mutations ; Genetic diversity ; Genetic variance ; Genome-wide association studies ; Genome-wide association studies (GWAS) ; Genome-Wide Association Study ; Genomes ; Genomics ; Growth models ; Health aspects ; Humans ; Hypotheses ; Hypothesis testing (Psychology) ; Methods ; Multiple hypothesis testing ; Polymorphism, Single Nucleotide ; Population Density ; Population genetics ; Proteomics ; RNA sequencing ; Statistical genetics ; Statistics ; Theoretical analysis ; Transcriptome ; Transcriptomes</subject><ispartof>BMC bioinformatics, 2023-04, Vol.24 (1), p.170-170, Article 170</ispartof><rights>2023. The Author(s).</rights><rights>COPYRIGHT 2023 BioMed Central Ltd.</rights><rights>2023. This work is licensed under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>The Author(s) 2023</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c549t-288f1ee6cc92ca3581295af37b0d4ed9650663683fc2f3ac9fe963c756db59e03</cites><orcidid>0000-0002-6020-4625</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC10134629/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.proquest.com/docview/2815544165?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>230,314,727,780,784,885,25753,27924,27925,37012,37013,44590,53791,53793</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/37101120$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Cai, Jitong</creatorcontrib><creatorcontrib>Zhan, Jianan</creatorcontrib><creatorcontrib>Arking, Dan E</creatorcontrib><creatorcontrib>Bader, Joel S</creatorcontrib><title>Priors, population sizes, and power in genome-wide hypothesis tests</title><title>BMC bioinformatics</title><addtitle>BMC Bioinformatics</addtitle><description>Genome-wide tests, including genome-wide association studies (GWAS) of germ-line genetic variants, driver tests of cancer somatic mutations, and transcriptome-wide association tests of RNAseq data, carry a high multiple testing burden. This burden can be overcome by enrolling larger cohorts or alleviated by using prior biological knowledge to favor some hypotheses over others. Here we compare these two methods in terms of their abilities to boost the power of hypothesis testing.
We provide a quantitative estimate for progress in cohort sizes and present a theoretical analysis of the power of oracular hard priors: priors that select a subset of hypotheses for testing, with an oracular guarantee that all true positives are within the tested subset. This theory demonstrates that for GWAS, strong priors that limit testing to 100-1000 genes provide less power than typical annual 20-40% increases in cohort sizes. Furthermore, non-oracular priors that exclude even a small fraction of true positives from the tested set can perform worse than not using a prior at all.
Our results provide a theoretical explanation for the continued dominance of simple, unbiased univariate hypothesis tests for GWAS: if a statistical question can be answered by larger cohort sizes, it should be answered by larger cohort sizes rather than by more complicated biased methods involving priors. We suggest that priors are better suited for non-statistical aspects of biology, such as pathway structure and causality, that are not yet easily captured by standard hypothesis tests.</description><subject>Analysis</subject><subject>Breast cancer</subject><subject>Gene mutations</subject><subject>Genetic diversity</subject><subject>Genetic variance</subject><subject>Genome-wide association studies</subject><subject>Genome-wide association studies (GWAS)</subject><subject>Genome-Wide Association Study</subject><subject>Genomes</subject><subject>Genomics</subject><subject>Growth models</subject><subject>Health aspects</subject><subject>Humans</subject><subject>Hypotheses</subject><subject>Hypothesis testing (Psychology)</subject><subject>Methods</subject><subject>Multiple hypothesis testing</subject><subject>Polymorphism, Single Nucleotide</subject><subject>Population Density</subject><subject>Population genetics</subject><subject>Proteomics</subject><subject>RNA sequencing</subject><subject>Statistical genetics</subject><subject>Statistics</subject><subject>Theoretical analysis</subject><subject>Transcriptome</subject><subject>Transcriptomes</subject><issn>1471-2105</issn><issn>1471-2105</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><sourceid>DOA</sourceid><recordid>eNptkktv1DAUhSMEoqXwB1igSGxAIsWP-LVC1YjCSJVAPNaWx7nOeJTEwU4o5dfX0ymlQcgLW9ffPdc-OkXxHKNTjCV_mzCRTFWI0AoxwnGlHhTHuBa4Ihixh_fOR8WTlHYIYSERe1wcUYERxgQdF6vP0YeY3pRjGOfOTD4MZfK_IVfM0OTqJcTSD2ULQ-ihuvQNlNurMUxbSD6VE6QpPS0eOdMleHa7nxTfz99_W32sLj59WK_OLirLajVVREqHAbi1ilhDmcREMeOo2KCmhkZxhjinXFJniaPGKgeKUysYbzZMAaInxfqg2wSz02P0vYlXOhivbwohttrEydsOtJSINpgJQomtnQWJVF0zjoTYOFELmrXeHbTGedNDY2GYoukWosubwW91G37qbBytOVFZ4dWtQgw_5uyD7n2y0HVmgDAnTSTiSlGC98Ne_oPuwhyH7FWmMGN1jTn7S7Um_8APLuTBdi-qz_KbsUAZzNTpf6i8Gui9DQM4n-uLhteLhsxM8GtqzZySXn_9smTJgbUxpBTB3RmCkd5nTh8yp3Pm9E3m9N6IF_etvGv5EzJ6DThXzfE</recordid><startdate>20230426</startdate><enddate>20230426</enddate><creator>Cai, Jitong</creator><creator>Zhan, Jianan</creator><creator>Arking, Dan E</creator><creator>Bader, Joel S</creator><general>BioMed Central Ltd</general><general>BioMed Central</general><general>BMC</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>ISR</scope><scope>3V.</scope><scope>7QO</scope><scope>7SC</scope><scope>7X7</scope><scope>7XB</scope><scope>88E</scope><scope>8AL</scope><scope>8AO</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FH</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>ABUWG</scope><scope>AEUYN</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BBNVY</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>BHPHI</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FR3</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>K9.</scope><scope>L7M</scope><scope>LK8</scope><scope>L~C</scope><scope>L~D</scope><scope>M0N</scope><scope>M0S</scope><scope>M1P</scope><scope>M7P</scope><scope>P5Z</scope><scope>P62</scope><scope>P64</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>Q9U</scope><scope>7X8</scope><scope>5PM</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0002-6020-4625</orcidid></search><sort><creationdate>20230426</creationdate><title>Priors, population sizes, and power in genome-wide hypothesis tests</title><author>Cai, Jitong ; Zhan, Jianan ; Arking, Dan E ; Bader, Joel S</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c549t-288f1ee6cc92ca3581295af37b0d4ed9650663683fc2f3ac9fe963c756db59e03</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Analysis</topic><topic>Breast cancer</topic><topic>Gene mutations</topic><topic>Genetic diversity</topic><topic>Genetic variance</topic><topic>Genome-wide association studies</topic><topic>Genome-wide association studies (GWAS)</topic><topic>Genome-Wide Association Study</topic><topic>Genomes</topic><topic>Genomics</topic><topic>Growth models</topic><topic>Health aspects</topic><topic>Humans</topic><topic>Hypotheses</topic><topic>Hypothesis testing (Psychology)</topic><topic>Methods</topic><topic>Multiple hypothesis testing</topic><topic>Polymorphism, Single Nucleotide</topic><topic>Population Density</topic><topic>Population genetics</topic><topic>Proteomics</topic><topic>RNA sequencing</topic><topic>Statistical genetics</topic><topic>Statistics</topic><topic>Theoretical analysis</topic><topic>Transcriptome</topic><topic>Transcriptomes</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Cai, Jitong</creatorcontrib><creatorcontrib>Zhan, Jianan</creatorcontrib><creatorcontrib>Arking, Dan E</creatorcontrib><creatorcontrib>Bader, Joel S</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Science (Gale in Context)</collection><collection>ProQuest Central (Corporate)</collection><collection>Biotechnology Research Abstracts</collection><collection>Computer and Information Systems Abstracts</collection><collection>ProQuest Health & Medical Collection</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Medical Database (Alumni Edition)</collection><collection>Computing Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest One Sustainability</collection><collection>ProQuest Central</collection><collection>Advanced Technologies & Aerospace Database (1962 - current)</collection><collection>ProQuest Central Essentials</collection><collection>Biological Science Collection</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>Engineering Research Database</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection (Proquest) (PQ_SDU_P3)</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>ProQuest Health & Medical Complete (Alumni)</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>ProQuest Biological Science Collection</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Computing Database</collection><collection>Health & Medical Collection (Alumni Edition)</collection><collection>PML(ProQuest Medical Library)</collection><collection>Biological Science Database</collection><collection>ProQuest Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central Basic</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>BMC bioinformatics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Cai, Jitong</au><au>Zhan, Jianan</au><au>Arking, Dan E</au><au>Bader, Joel S</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Priors, population sizes, and power in genome-wide hypothesis tests</atitle><jtitle>BMC bioinformatics</jtitle><addtitle>BMC Bioinformatics</addtitle><date>2023-04-26</date><risdate>2023</risdate><volume>24</volume><issue>1</issue><spage>170</spage><epage>170</epage><pages>170-170</pages><artnum>170</artnum><issn>1471-2105</issn><eissn>1471-2105</eissn><abstract>Genome-wide tests, including genome-wide association studies (GWAS) of germ-line genetic variants, driver tests of cancer somatic mutations, and transcriptome-wide association tests of RNAseq data, carry a high multiple testing burden. This burden can be overcome by enrolling larger cohorts or alleviated by using prior biological knowledge to favor some hypotheses over others. Here we compare these two methods in terms of their abilities to boost the power of hypothesis testing.
We provide a quantitative estimate for progress in cohort sizes and present a theoretical analysis of the power of oracular hard priors: priors that select a subset of hypotheses for testing, with an oracular guarantee that all true positives are within the tested subset. This theory demonstrates that for GWAS, strong priors that limit testing to 100-1000 genes provide less power than typical annual 20-40% increases in cohort sizes. Furthermore, non-oracular priors that exclude even a small fraction of true positives from the tested set can perform worse than not using a prior at all.
Our results provide a theoretical explanation for the continued dominance of simple, unbiased univariate hypothesis tests for GWAS: if a statistical question can be answered by larger cohort sizes, it should be answered by larger cohort sizes rather than by more complicated biased methods involving priors. We suggest that priors are better suited for non-statistical aspects of biology, such as pathway structure and causality, that are not yet easily captured by standard hypothesis tests.</abstract><cop>England</cop><pub>BioMed Central Ltd</pub><pmid>37101120</pmid><doi>10.1186/s12859-023-05261-9</doi><tpages>1</tpages><orcidid>https://orcid.org/0000-0002-6020-4625</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1471-2105 |
ispartof | BMC bioinformatics, 2023-04, Vol.24 (1), p.170-170, Article 170 |
issn | 1471-2105 1471-2105 |
language | eng |
recordid | cdi_doaj_primary_oai_doaj_org_article_8803d157232c4fce8094456077bf7473 |
source | Publicly Available Content Database; PubMed Central |
subjects | Analysis Breast cancer Gene mutations Genetic diversity Genetic variance Genome-wide association studies Genome-wide association studies (GWAS) Genome-Wide Association Study Genomes Genomics Growth models Health aspects Humans Hypotheses Hypothesis testing (Psychology) Methods Multiple hypothesis testing Polymorphism, Single Nucleotide Population Density Population genetics Proteomics RNA sequencing Statistical genetics Statistics Theoretical analysis Transcriptome Transcriptomes |
title | Priors, population sizes, and power in genome-wide hypothesis tests |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-07T16%3A00%3A21IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_doaj_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Priors,%20population%20sizes,%20and%20power%20in%20genome-wide%20hypothesis%20tests&rft.jtitle=BMC%20bioinformatics&rft.au=Cai,%20Jitong&rft.date=2023-04-26&rft.volume=24&rft.issue=1&rft.spage=170&rft.epage=170&rft.pages=170-170&rft.artnum=170&rft.issn=1471-2105&rft.eissn=1471-2105&rft_id=info:doi/10.1186/s12859-023-05261-9&rft_dat=%3Cgale_doaj_%3EA747170544%3C/gale_doaj_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c549t-288f1ee6cc92ca3581295af37b0d4ed9650663683fc2f3ac9fe963c756db59e03%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2815544165&rft_id=info:pmid/37101120&rft_galeid=A747170544&rfr_iscdi=true |