Loading…
The limitations of simple gene set enrichment analysis assuming gene independence
Since its first publication in 2003, the Gene Set Enrichment Analysis method, based on the Kolmogorov-Smirnov statistic, has been heavily used, modified, and also questioned. Recently a simplified approach using a one-sample t-test score to assess enrichment and ignoring gene-gene correlations was p...
Saved in:
Published in: | Statistical methods in medical research 2016-02, Vol.25 (1), p.472-487 |
---|---|
Main Authors: | , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | cdi_FETCH-LOGICAL-c462t-a4578a329c86e665bf48eafd800a2ee317f791b9846e688627761201a6554bf93 |
---|---|
cites | cdi_FETCH-LOGICAL-c462t-a4578a329c86e665bf48eafd800a2ee317f791b9846e688627761201a6554bf93 |
container_end_page | 487 |
container_issue | 1 |
container_start_page | 472 |
container_title | Statistical methods in medical research |
container_volume | 25 |
creator | Tamayo, Pablo Steinhardt, George Liberzon, Arthur Mesirov, Jill P |
description | Since its first publication in 2003, the Gene Set Enrichment Analysis method, based on the Kolmogorov-Smirnov statistic, has been heavily used, modified, and also questioned. Recently a simplified approach using a one-sample t-test score to assess enrichment and ignoring gene-gene correlations was proposed by Irizarry et al. 2009 as a serious contender. The argument criticizes Gene Set Enrichment Analysis’s nonparametric nature and its use of an empirical null distribution as unnecessary and hard to compute. We refute these claims by careful consideration of the assumptions of the simplified method and its results, including a comparison with Gene Set Enrichment Analysis’s on a large benchmark set of 50 datasets. Our results provide strong empirical evidence that gene–gene correlations cannot be ignored due to the significant variance inflation they produced on the enrichment scores and should be taken into account when estimating gene set enrichment significance. In addition, we discuss the challenges that the complex correlation structure and multi-modality of gene sets pose more generally for gene set enrichment methods. |
doi_str_mv | 10.1177/0962280212460441 |
format | article |
fullrecord | <record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_3758419</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sage_id>10.1177_0962280212460441</sage_id><sourcerecordid>1878495401</sourcerecordid><originalsourceid>FETCH-LOGICAL-c462t-a4578a329c86e665bf48eafd800a2ee317f791b9846e688627761201a6554bf93</originalsourceid><addsrcrecordid>eNp1kc9LwzAUx4Mobk7vnqTgxUs1SdMkvQgy_AUDEeY5pN1rl9GmtWmF_femdA4deMkLfD_vm5f3ReiS4FtChLjDCadUYkoo45gxcoSmhAkR4ihix2g6yOGgT9CZcxuMscAsOUUTGvlbnNApel-uIShNZTrdmdq6oM4DZ6qmhKAAC4GDLgDbmmxdge0CbXW5dcYF2rm-MrYYKWNX0IA_bAbn6CTXpYOLXZ2hj6fH5fwlXLw9v84fFmHGOO1CzWIhdUSTTHLgPE5zJkHnK4mxpgAREblISJpI5mUpORWCE4qJ5nHM0jyJZuh-9G36tIJV5sdrdama1lS63apaG_VXsWativpLRSKWjAwGNzuDtv7swXWqMi6DstQW6t4pIjinXGBBPHp9gG7qvvW78JQUkiUxwwOFRypra-dayPfDEKyGvNRhXr7l6vcn9g0_AXkgHAGnC_j16n-G30zQnNw</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1878495401</pqid></control><display><type>article</type><title>The limitations of simple gene set enrichment analysis assuming gene independence</title><source>Applied Social Sciences Index & Abstracts (ASSIA)</source><source>SAGE:Jisc Collections:SAGE Journals Read and Publish 2023-2024:2025 extension (reading list)</source><creator>Tamayo, Pablo ; Steinhardt, George ; Liberzon, Arthur ; Mesirov, Jill P</creator><creatorcontrib>Tamayo, Pablo ; Steinhardt, George ; Liberzon, Arthur ; Mesirov, Jill P</creatorcontrib><description>Since its first publication in 2003, the Gene Set Enrichment Analysis method, based on the Kolmogorov-Smirnov statistic, has been heavily used, modified, and also questioned. Recently a simplified approach using a one-sample t-test score to assess enrichment and ignoring gene-gene correlations was proposed by Irizarry et al. 2009 as a serious contender. The argument criticizes Gene Set Enrichment Analysis’s nonparametric nature and its use of an empirical null distribution as unnecessary and hard to compute. We refute these claims by careful consideration of the assumptions of the simplified method and its results, including a comparison with Gene Set Enrichment Analysis’s on a large benchmark set of 50 datasets. Our results provide strong empirical evidence that gene–gene correlations cannot be ignored due to the significant variance inflation they produced on the enrichment scores and should be taken into account when estimating gene set enrichment significance. In addition, we discuss the challenges that the complex correlation structure and multi-modality of gene sets pose more generally for gene set enrichment methods.</description><identifier>ISSN: 0962-2802</identifier><identifier>EISSN: 1477-0334</identifier><identifier>DOI: 10.1177/0962280212460441</identifier><identifier>PMID: 23070592</identifier><language>eng</language><publisher>London, England: SAGE Publications</publisher><subject>Binding sites ; Biostatistics ; Databases, Genetic - statistics & numerical data ; Empirical analysis ; Enrichment ; Epistasis, Genetic ; Gene Expression Profiling - statistics & numerical data ; Genome, Human ; Humans ; Inflation ; Knowledge Bases ; Models, Statistical ; Oligonucleotide Array Sequence Analysis - statistics & numerical data ; Statistics, Nonparametric</subject><ispartof>Statistical methods in medical research, 2016-02, Vol.25 (1), p.472-487</ispartof><rights>The Author(s) 2012</rights><rights>The Author(s) 2012.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c462t-a4578a329c86e665bf48eafd800a2ee317f791b9846e688627761201a6554bf93</citedby><cites>FETCH-LOGICAL-c462t-a4578a329c86e665bf48eafd800a2ee317f791b9846e688627761201a6554bf93</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>230,314,776,780,881,27901,27902,30976</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/23070592$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Tamayo, Pablo</creatorcontrib><creatorcontrib>Steinhardt, George</creatorcontrib><creatorcontrib>Liberzon, Arthur</creatorcontrib><creatorcontrib>Mesirov, Jill P</creatorcontrib><title>The limitations of simple gene set enrichment analysis assuming gene independence</title><title>Statistical methods in medical research</title><addtitle>Stat Methods Med Res</addtitle><description>Since its first publication in 2003, the Gene Set Enrichment Analysis method, based on the Kolmogorov-Smirnov statistic, has been heavily used, modified, and also questioned. Recently a simplified approach using a one-sample t-test score to assess enrichment and ignoring gene-gene correlations was proposed by Irizarry et al. 2009 as a serious contender. The argument criticizes Gene Set Enrichment Analysis’s nonparametric nature and its use of an empirical null distribution as unnecessary and hard to compute. We refute these claims by careful consideration of the assumptions of the simplified method and its results, including a comparison with Gene Set Enrichment Analysis’s on a large benchmark set of 50 datasets. Our results provide strong empirical evidence that gene–gene correlations cannot be ignored due to the significant variance inflation they produced on the enrichment scores and should be taken into account when estimating gene set enrichment significance. In addition, we discuss the challenges that the complex correlation structure and multi-modality of gene sets pose more generally for gene set enrichment methods.</description><subject>Binding sites</subject><subject>Biostatistics</subject><subject>Databases, Genetic - statistics & numerical data</subject><subject>Empirical analysis</subject><subject>Enrichment</subject><subject>Epistasis, Genetic</subject><subject>Gene Expression Profiling - statistics & numerical data</subject><subject>Genome, Human</subject><subject>Humans</subject><subject>Inflation</subject><subject>Knowledge Bases</subject><subject>Models, Statistical</subject><subject>Oligonucleotide Array Sequence Analysis - statistics & numerical data</subject><subject>Statistics, Nonparametric</subject><issn>0962-2802</issn><issn>1477-0334</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2016</creationdate><recordtype>article</recordtype><sourceid>7QJ</sourceid><recordid>eNp1kc9LwzAUx4Mobk7vnqTgxUs1SdMkvQgy_AUDEeY5pN1rl9GmtWmF_femdA4deMkLfD_vm5f3ReiS4FtChLjDCadUYkoo45gxcoSmhAkR4ihix2g6yOGgT9CZcxuMscAsOUUTGvlbnNApel-uIShNZTrdmdq6oM4DZ6qmhKAAC4GDLgDbmmxdge0CbXW5dcYF2rm-MrYYKWNX0IA_bAbn6CTXpYOLXZ2hj6fH5fwlXLw9v84fFmHGOO1CzWIhdUSTTHLgPE5zJkHnK4mxpgAREblISJpI5mUpORWCE4qJ5nHM0jyJZuh-9G36tIJV5sdrdama1lS63apaG_VXsWativpLRSKWjAwGNzuDtv7swXWqMi6DstQW6t4pIjinXGBBPHp9gG7qvvW78JQUkiUxwwOFRypra-dayPfDEKyGvNRhXr7l6vcn9g0_AXkgHAGnC_j16n-G30zQnNw</recordid><startdate>20160201</startdate><enddate>20160201</enddate><creator>Tamayo, Pablo</creator><creator>Steinhardt, George</creator><creator>Liberzon, Arthur</creator><creator>Mesirov, Jill P</creator><general>SAGE Publications</general><general>Sage Publications Ltd</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7QJ</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>K9.</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>7X8</scope><scope>5PM</scope></search><sort><creationdate>20160201</creationdate><title>The limitations of simple gene set enrichment analysis assuming gene independence</title><author>Tamayo, Pablo ; Steinhardt, George ; Liberzon, Arthur ; Mesirov, Jill P</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c462t-a4578a329c86e665bf48eafd800a2ee317f791b9846e688627761201a6554bf93</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2016</creationdate><topic>Binding sites</topic><topic>Biostatistics</topic><topic>Databases, Genetic - statistics & numerical data</topic><topic>Empirical analysis</topic><topic>Enrichment</topic><topic>Epistasis, Genetic</topic><topic>Gene Expression Profiling - statistics & numerical data</topic><topic>Genome, Human</topic><topic>Humans</topic><topic>Inflation</topic><topic>Knowledge Bases</topic><topic>Models, Statistical</topic><topic>Oligonucleotide Array Sequence Analysis - statistics & numerical data</topic><topic>Statistics, Nonparametric</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Tamayo, Pablo</creatorcontrib><creatorcontrib>Steinhardt, George</creatorcontrib><creatorcontrib>Liberzon, Arthur</creatorcontrib><creatorcontrib>Mesirov, Jill P</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Applied Social Sciences Index & Abstracts (ASSIA)</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest Health & Medical Complete (Alumni)</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Statistical methods in medical research</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Tamayo, Pablo</au><au>Steinhardt, George</au><au>Liberzon, Arthur</au><au>Mesirov, Jill P</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>The limitations of simple gene set enrichment analysis assuming gene independence</atitle><jtitle>Statistical methods in medical research</jtitle><addtitle>Stat Methods Med Res</addtitle><date>2016-02-01</date><risdate>2016</risdate><volume>25</volume><issue>1</issue><spage>472</spage><epage>487</epage><pages>472-487</pages><issn>0962-2802</issn><eissn>1477-0334</eissn><abstract>Since its first publication in 2003, the Gene Set Enrichment Analysis method, based on the Kolmogorov-Smirnov statistic, has been heavily used, modified, and also questioned. Recently a simplified approach using a one-sample t-test score to assess enrichment and ignoring gene-gene correlations was proposed by Irizarry et al. 2009 as a serious contender. The argument criticizes Gene Set Enrichment Analysis’s nonparametric nature and its use of an empirical null distribution as unnecessary and hard to compute. We refute these claims by careful consideration of the assumptions of the simplified method and its results, including a comparison with Gene Set Enrichment Analysis’s on a large benchmark set of 50 datasets. Our results provide strong empirical evidence that gene–gene correlations cannot be ignored due to the significant variance inflation they produced on the enrichment scores and should be taken into account when estimating gene set enrichment significance. In addition, we discuss the challenges that the complex correlation structure and multi-modality of gene sets pose more generally for gene set enrichment methods.</abstract><cop>London, England</cop><pub>SAGE Publications</pub><pmid>23070592</pmid><doi>10.1177/0962280212460441</doi><tpages>16</tpages><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0962-2802 |
ispartof | Statistical methods in medical research, 2016-02, Vol.25 (1), p.472-487 |
issn | 0962-2802 1477-0334 |
language | eng |
recordid | cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_3758419 |
source | Applied Social Sciences Index & Abstracts (ASSIA); SAGE:Jisc Collections:SAGE Journals Read and Publish 2023-2024:2025 extension (reading list) |
subjects | Binding sites Biostatistics Databases, Genetic - statistics & numerical data Empirical analysis Enrichment Epistasis, Genetic Gene Expression Profiling - statistics & numerical data Genome, Human Humans Inflation Knowledge Bases Models, Statistical Oligonucleotide Array Sequence Analysis - statistics & numerical data Statistics, Nonparametric |
title | The limitations of simple gene set enrichment analysis assuming gene independence |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-29T12%3A06%3A54IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=The%20limitations%20of%20simple%20gene%20set%20enrichment%20analysis%20assuming%20gene%20independence&rft.jtitle=Statistical%20methods%20in%20medical%20research&rft.au=Tamayo,%20Pablo&rft.date=2016-02-01&rft.volume=25&rft.issue=1&rft.spage=472&rft.epage=487&rft.pages=472-487&rft.issn=0962-2802&rft.eissn=1477-0334&rft_id=info:doi/10.1177/0962280212460441&rft_dat=%3Cproquest_pubme%3E1878495401%3C/proquest_pubme%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c462t-a4578a329c86e665bf48eafd800a2ee317f791b9846e688627761201a6554bf93%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=1878495401&rft_id=info:pmid/23070592&rft_sage_id=10.1177_0962280212460441&rfr_iscdi=true |