Loading…

Priors, population sizes, and power in genome-wide hypothesis tests

Genome-wide tests, including genome-wide association studies (GWAS) of germ-line genetic variants, driver tests of cancer somatic mutations, and transcriptome-wide association tests of RNAseq data, carry a high multiple testing burden. This burden can be overcome by enrolling larger cohorts or allev...

Full description

Saved in:
Bibliographic Details
Published in:BMC bioinformatics 2023-04, Vol.24 (1), p.170-170, Article 170
Main Authors: Cai, Jitong, Zhan, Jianan, Arking, Dan E, Bader, Joel S
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites cdi_FETCH-LOGICAL-c549t-288f1ee6cc92ca3581295af37b0d4ed9650663683fc2f3ac9fe963c756db59e03
container_end_page 170
container_issue 1
container_start_page 170
container_title BMC bioinformatics
container_volume 24
creator Cai, Jitong
Zhan, Jianan
Arking, Dan E
Bader, Joel S
description Genome-wide tests, including genome-wide association studies (GWAS) of germ-line genetic variants, driver tests of cancer somatic mutations, and transcriptome-wide association tests of RNAseq data, carry a high multiple testing burden. This burden can be overcome by enrolling larger cohorts or alleviated by using prior biological knowledge to favor some hypotheses over others. Here we compare these two methods in terms of their abilities to boost the power of hypothesis testing. We provide a quantitative estimate for progress in cohort sizes and present a theoretical analysis of the power of oracular hard priors: priors that select a subset of hypotheses for testing, with an oracular guarantee that all true positives are within the tested subset. This theory demonstrates that for GWAS, strong priors that limit testing to 100-1000 genes provide less power than typical annual 20-40% increases in cohort sizes. Furthermore, non-oracular priors that exclude even a small fraction of true positives from the tested set can perform worse than not using a prior at all. Our results provide a theoretical explanation for the continued dominance of simple, unbiased univariate hypothesis tests for GWAS: if a statistical question can be answered by larger cohort sizes, it should be answered by larger cohort sizes rather than by more complicated biased methods involving priors. We suggest that priors are better suited for non-statistical aspects of biology, such as pathway structure and causality, that are not yet easily captured by standard hypothesis tests.
doi_str_mv 10.1186/s12859-023-05261-9
format article
fullrecord <record><control><sourceid>gale_doaj_</sourceid><recordid>TN_cdi_doaj_primary_oai_doaj_org_article_8803d157232c4fce8094456077bf7473</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A747170544</galeid><doaj_id>oai_doaj_org_article_8803d157232c4fce8094456077bf7473</doaj_id><sourcerecordid>A747170544</sourcerecordid><originalsourceid>FETCH-LOGICAL-c549t-288f1ee6cc92ca3581295af37b0d4ed9650663683fc2f3ac9fe963c756db59e03</originalsourceid><addsrcrecordid>eNptkktv1DAUhSMEoqXwB1igSGxAIsWP-LVC1YjCSJVAPNaWx7nOeJTEwU4o5dfX0ymlQcgLW9ffPdc-OkXxHKNTjCV_mzCRTFWI0AoxwnGlHhTHuBa4Ihixh_fOR8WTlHYIYSERe1wcUYERxgQdF6vP0YeY3pRjGOfOTD4MZfK_IVfM0OTqJcTSD2ULQ-ihuvQNlNurMUxbSD6VE6QpPS0eOdMleHa7nxTfz99_W32sLj59WK_OLirLajVVREqHAbi1ilhDmcREMeOo2KCmhkZxhjinXFJniaPGKgeKUysYbzZMAaInxfqg2wSz02P0vYlXOhivbwohttrEydsOtJSINpgJQomtnQWJVF0zjoTYOFELmrXeHbTGedNDY2GYoukWosubwW91G37qbBytOVFZ4dWtQgw_5uyD7n2y0HVmgDAnTSTiSlGC98Ne_oPuwhyH7FWmMGN1jTn7S7Um_8APLuTBdi-qz_KbsUAZzNTpf6i8Gui9DQM4n-uLhteLhsxM8GtqzZySXn_9smTJgbUxpBTB3RmCkd5nTh8yp3Pm9E3m9N6IF_etvGv5EzJ6DThXzfE</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2815544165</pqid></control><display><type>article</type><title>Priors, population sizes, and power in genome-wide hypothesis tests</title><source>Publicly Available Content Database</source><source>PubMed Central</source><creator>Cai, Jitong ; Zhan, Jianan ; Arking, Dan E ; Bader, Joel S</creator><creatorcontrib>Cai, Jitong ; Zhan, Jianan ; Arking, Dan E ; Bader, Joel S</creatorcontrib><description>Genome-wide tests, including genome-wide association studies (GWAS) of germ-line genetic variants, driver tests of cancer somatic mutations, and transcriptome-wide association tests of RNAseq data, carry a high multiple testing burden. This burden can be overcome by enrolling larger cohorts or alleviated by using prior biological knowledge to favor some hypotheses over others. Here we compare these two methods in terms of their abilities to boost the power of hypothesis testing. We provide a quantitative estimate for progress in cohort sizes and present a theoretical analysis of the power of oracular hard priors: priors that select a subset of hypotheses for testing, with an oracular guarantee that all true positives are within the tested subset. This theory demonstrates that for GWAS, strong priors that limit testing to 100-1000 genes provide less power than typical annual 20-40% increases in cohort sizes. Furthermore, non-oracular priors that exclude even a small fraction of true positives from the tested set can perform worse than not using a prior at all. Our results provide a theoretical explanation for the continued dominance of simple, unbiased univariate hypothesis tests for GWAS: if a statistical question can be answered by larger cohort sizes, it should be answered by larger cohort sizes rather than by more complicated biased methods involving priors. We suggest that priors are better suited for non-statistical aspects of biology, such as pathway structure and causality, that are not yet easily captured by standard hypothesis tests.</description><identifier>ISSN: 1471-2105</identifier><identifier>EISSN: 1471-2105</identifier><identifier>DOI: 10.1186/s12859-023-05261-9</identifier><identifier>PMID: 37101120</identifier><language>eng</language><publisher>England: BioMed Central Ltd</publisher><subject>Analysis ; Breast cancer ; Gene mutations ; Genetic diversity ; Genetic variance ; Genome-wide association studies ; Genome-wide association studies (GWAS) ; Genome-Wide Association Study ; Genomes ; Genomics ; Growth models ; Health aspects ; Humans ; Hypotheses ; Hypothesis testing (Psychology) ; Methods ; Multiple hypothesis testing ; Polymorphism, Single Nucleotide ; Population Density ; Population genetics ; Proteomics ; RNA sequencing ; Statistical genetics ; Statistics ; Theoretical analysis ; Transcriptome ; Transcriptomes</subject><ispartof>BMC bioinformatics, 2023-04, Vol.24 (1), p.170-170, Article 170</ispartof><rights>2023. The Author(s).</rights><rights>COPYRIGHT 2023 BioMed Central Ltd.</rights><rights>2023. This work is licensed under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>The Author(s) 2023</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c549t-288f1ee6cc92ca3581295af37b0d4ed9650663683fc2f3ac9fe963c756db59e03</cites><orcidid>0000-0002-6020-4625</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC10134629/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.proquest.com/docview/2815544165?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>230,314,727,780,784,885,25753,27924,27925,37012,37013,44590,53791,53793</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/37101120$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Cai, Jitong</creatorcontrib><creatorcontrib>Zhan, Jianan</creatorcontrib><creatorcontrib>Arking, Dan E</creatorcontrib><creatorcontrib>Bader, Joel S</creatorcontrib><title>Priors, population sizes, and power in genome-wide hypothesis tests</title><title>BMC bioinformatics</title><addtitle>BMC Bioinformatics</addtitle><description>Genome-wide tests, including genome-wide association studies (GWAS) of germ-line genetic variants, driver tests of cancer somatic mutations, and transcriptome-wide association tests of RNAseq data, carry a high multiple testing burden. This burden can be overcome by enrolling larger cohorts or alleviated by using prior biological knowledge to favor some hypotheses over others. Here we compare these two methods in terms of their abilities to boost the power of hypothesis testing. We provide a quantitative estimate for progress in cohort sizes and present a theoretical analysis of the power of oracular hard priors: priors that select a subset of hypotheses for testing, with an oracular guarantee that all true positives are within the tested subset. This theory demonstrates that for GWAS, strong priors that limit testing to 100-1000 genes provide less power than typical annual 20-40% increases in cohort sizes. Furthermore, non-oracular priors that exclude even a small fraction of true positives from the tested set can perform worse than not using a prior at all. Our results provide a theoretical explanation for the continued dominance of simple, unbiased univariate hypothesis tests for GWAS: if a statistical question can be answered by larger cohort sizes, it should be answered by larger cohort sizes rather than by more complicated biased methods involving priors. We suggest that priors are better suited for non-statistical aspects of biology, such as pathway structure and causality, that are not yet easily captured by standard hypothesis tests.</description><subject>Analysis</subject><subject>Breast cancer</subject><subject>Gene mutations</subject><subject>Genetic diversity</subject><subject>Genetic variance</subject><subject>Genome-wide association studies</subject><subject>Genome-wide association studies (GWAS)</subject><subject>Genome-Wide Association Study</subject><subject>Genomes</subject><subject>Genomics</subject><subject>Growth models</subject><subject>Health aspects</subject><subject>Humans</subject><subject>Hypotheses</subject><subject>Hypothesis testing (Psychology)</subject><subject>Methods</subject><subject>Multiple hypothesis testing</subject><subject>Polymorphism, Single Nucleotide</subject><subject>Population Density</subject><subject>Population genetics</subject><subject>Proteomics</subject><subject>RNA sequencing</subject><subject>Statistical genetics</subject><subject>Statistics</subject><subject>Theoretical analysis</subject><subject>Transcriptome</subject><subject>Transcriptomes</subject><issn>1471-2105</issn><issn>1471-2105</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><sourceid>DOA</sourceid><recordid>eNptkktv1DAUhSMEoqXwB1igSGxAIsWP-LVC1YjCSJVAPNaWx7nOeJTEwU4o5dfX0ymlQcgLW9ffPdc-OkXxHKNTjCV_mzCRTFWI0AoxwnGlHhTHuBa4Ihixh_fOR8WTlHYIYSERe1wcUYERxgQdF6vP0YeY3pRjGOfOTD4MZfK_IVfM0OTqJcTSD2ULQ-ihuvQNlNurMUxbSD6VE6QpPS0eOdMleHa7nxTfz99_W32sLj59WK_OLirLajVVREqHAbi1ilhDmcREMeOo2KCmhkZxhjinXFJniaPGKgeKUysYbzZMAaInxfqg2wSz02P0vYlXOhivbwohttrEydsOtJSINpgJQomtnQWJVF0zjoTYOFELmrXeHbTGedNDY2GYoukWosubwW91G37qbBytOVFZ4dWtQgw_5uyD7n2y0HVmgDAnTSTiSlGC98Ne_oPuwhyH7FWmMGN1jTn7S7Um_8APLuTBdi-qz_KbsUAZzNTpf6i8Gui9DQM4n-uLhteLhsxM8GtqzZySXn_9smTJgbUxpBTB3RmCkd5nTh8yp3Pm9E3m9N6IF_etvGv5EzJ6DThXzfE</recordid><startdate>20230426</startdate><enddate>20230426</enddate><creator>Cai, Jitong</creator><creator>Zhan, Jianan</creator><creator>Arking, Dan E</creator><creator>Bader, Joel S</creator><general>BioMed Central Ltd</general><general>BioMed Central</general><general>BMC</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>ISR</scope><scope>3V.</scope><scope>7QO</scope><scope>7SC</scope><scope>7X7</scope><scope>7XB</scope><scope>88E</scope><scope>8AL</scope><scope>8AO</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FH</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>ABUWG</scope><scope>AEUYN</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BBNVY</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>BHPHI</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FR3</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>K9.</scope><scope>L7M</scope><scope>LK8</scope><scope>L~C</scope><scope>L~D</scope><scope>M0N</scope><scope>M0S</scope><scope>M1P</scope><scope>M7P</scope><scope>P5Z</scope><scope>P62</scope><scope>P64</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>Q9U</scope><scope>7X8</scope><scope>5PM</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0002-6020-4625</orcidid></search><sort><creationdate>20230426</creationdate><title>Priors, population sizes, and power in genome-wide hypothesis tests</title><author>Cai, Jitong ; Zhan, Jianan ; Arking, Dan E ; Bader, Joel S</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c549t-288f1ee6cc92ca3581295af37b0d4ed9650663683fc2f3ac9fe963c756db59e03</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Analysis</topic><topic>Breast cancer</topic><topic>Gene mutations</topic><topic>Genetic diversity</topic><topic>Genetic variance</topic><topic>Genome-wide association studies</topic><topic>Genome-wide association studies (GWAS)</topic><topic>Genome-Wide Association Study</topic><topic>Genomes</topic><topic>Genomics</topic><topic>Growth models</topic><topic>Health aspects</topic><topic>Humans</topic><topic>Hypotheses</topic><topic>Hypothesis testing (Psychology)</topic><topic>Methods</topic><topic>Multiple hypothesis testing</topic><topic>Polymorphism, Single Nucleotide</topic><topic>Population Density</topic><topic>Population genetics</topic><topic>Proteomics</topic><topic>RNA sequencing</topic><topic>Statistical genetics</topic><topic>Statistics</topic><topic>Theoretical analysis</topic><topic>Transcriptome</topic><topic>Transcriptomes</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Cai, Jitong</creatorcontrib><creatorcontrib>Zhan, Jianan</creatorcontrib><creatorcontrib>Arking, Dan E</creatorcontrib><creatorcontrib>Bader, Joel S</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Science (Gale in Context)</collection><collection>ProQuest Central (Corporate)</collection><collection>Biotechnology Research Abstracts</collection><collection>Computer and Information Systems Abstracts</collection><collection>ProQuest Health &amp; Medical Collection</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Medical Database (Alumni Edition)</collection><collection>Computing Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest One Sustainability</collection><collection>ProQuest Central</collection><collection>Advanced Technologies &amp; Aerospace Database‎ (1962 - current)</collection><collection>ProQuest Central Essentials</collection><collection>Biological Science Collection</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>Engineering Research Database</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection (Proquest) (PQ_SDU_P3)</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>ProQuest Health &amp; Medical Complete (Alumni)</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>ProQuest Biological Science Collection</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Computing Database</collection><collection>Health &amp; Medical Collection (Alumni Edition)</collection><collection>PML(ProQuest Medical Library)</collection><collection>Biological Science Database</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Database</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Collection</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central Basic</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>BMC bioinformatics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Cai, Jitong</au><au>Zhan, Jianan</au><au>Arking, Dan E</au><au>Bader, Joel S</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Priors, population sizes, and power in genome-wide hypothesis tests</atitle><jtitle>BMC bioinformatics</jtitle><addtitle>BMC Bioinformatics</addtitle><date>2023-04-26</date><risdate>2023</risdate><volume>24</volume><issue>1</issue><spage>170</spage><epage>170</epage><pages>170-170</pages><artnum>170</artnum><issn>1471-2105</issn><eissn>1471-2105</eissn><abstract>Genome-wide tests, including genome-wide association studies (GWAS) of germ-line genetic variants, driver tests of cancer somatic mutations, and transcriptome-wide association tests of RNAseq data, carry a high multiple testing burden. This burden can be overcome by enrolling larger cohorts or alleviated by using prior biological knowledge to favor some hypotheses over others. Here we compare these two methods in terms of their abilities to boost the power of hypothesis testing. We provide a quantitative estimate for progress in cohort sizes and present a theoretical analysis of the power of oracular hard priors: priors that select a subset of hypotheses for testing, with an oracular guarantee that all true positives are within the tested subset. This theory demonstrates that for GWAS, strong priors that limit testing to 100-1000 genes provide less power than typical annual 20-40% increases in cohort sizes. Furthermore, non-oracular priors that exclude even a small fraction of true positives from the tested set can perform worse than not using a prior at all. Our results provide a theoretical explanation for the continued dominance of simple, unbiased univariate hypothesis tests for GWAS: if a statistical question can be answered by larger cohort sizes, it should be answered by larger cohort sizes rather than by more complicated biased methods involving priors. We suggest that priors are better suited for non-statistical aspects of biology, such as pathway structure and causality, that are not yet easily captured by standard hypothesis tests.</abstract><cop>England</cop><pub>BioMed Central Ltd</pub><pmid>37101120</pmid><doi>10.1186/s12859-023-05261-9</doi><tpages>1</tpages><orcidid>https://orcid.org/0000-0002-6020-4625</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1471-2105
ispartof BMC bioinformatics, 2023-04, Vol.24 (1), p.170-170, Article 170
issn 1471-2105
1471-2105
language eng
recordid cdi_doaj_primary_oai_doaj_org_article_8803d157232c4fce8094456077bf7473
source Publicly Available Content Database; PubMed Central
subjects Analysis
Breast cancer
Gene mutations
Genetic diversity
Genetic variance
Genome-wide association studies
Genome-wide association studies (GWAS)
Genome-Wide Association Study
Genomes
Genomics
Growth models
Health aspects
Humans
Hypotheses
Hypothesis testing (Psychology)
Methods
Multiple hypothesis testing
Polymorphism, Single Nucleotide
Population Density
Population genetics
Proteomics
RNA sequencing
Statistical genetics
Statistics
Theoretical analysis
Transcriptome
Transcriptomes
title Priors, population sizes, and power in genome-wide hypothesis tests
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-07T16%3A00%3A21IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_doaj_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Priors,%20population%20sizes,%20and%20power%20in%20genome-wide%20hypothesis%20tests&rft.jtitle=BMC%20bioinformatics&rft.au=Cai,%20Jitong&rft.date=2023-04-26&rft.volume=24&rft.issue=1&rft.spage=170&rft.epage=170&rft.pages=170-170&rft.artnum=170&rft.issn=1471-2105&rft.eissn=1471-2105&rft_id=info:doi/10.1186/s12859-023-05261-9&rft_dat=%3Cgale_doaj_%3EA747170544%3C/gale_doaj_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c549t-288f1ee6cc92ca3581295af37b0d4ed9650663683fc2f3ac9fe963c756db59e03%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2815544165&rft_id=info:pmid/37101120&rft_galeid=A747170544&rfr_iscdi=true