Loading…

Comparative analysis of methods for detecting interacting loci

Interactions among genetic loci are believed to play an important role in disease risk. While many methods have been proposed for detecting such interactions, their relative performance remains largely unclear, mainly because different data sources, detection performance criteria, and experimental p...

Full description

Saved in:
Bibliographic Details
Published in:BMC genomics 2011-07, Vol.12 (1), p.344-344, Article 344
Main Authors: Chen, Li, Yu, Guoqiang, Langefeld, Carl D, Miller, David J, Guy, Richard T, Raghuram, Jayaram, Yuan, Xiguo, Herrington, David M, Wang, Yue
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-b622t-f2dad3943b01bd931332395922ede9df021fc7e2f60686c24b9c920be9052f9a3
cites cdi_FETCH-LOGICAL-b622t-f2dad3943b01bd931332395922ede9df021fc7e2f60686c24b9c920be9052f9a3
container_end_page 344
container_issue 1
container_start_page 344
container_title BMC genomics
container_volume 12
creator Chen, Li
Yu, Guoqiang
Langefeld, Carl D
Miller, David J
Guy, Richard T
Raghuram, Jayaram
Yuan, Xiguo
Herrington, David M
Wang, Yue
description Interactions among genetic loci are believed to play an important role in disease risk. While many methods have been proposed for detecting such interactions, their relative performance remains largely unclear, mainly because different data sources, detection performance criteria, and experimental protocols were used in the papers introducing these methods and in subsequent studies. Moreover, there have been very few studies strictly focused on comparison of existing methods. Given the importance of detecting gene-gene and gene-environment interactions, a rigorous, comprehensive comparison of performance and limitations of available interaction detection methods is warranted. We report a comparison of eight representative methods, of which seven were specifically designed to detect interactions among single nucleotide polymorphisms (SNPs), with the last a popular main-effect testing method used as a baseline for performance evaluation. The selected methods, multifactor dimensionality reduction (MDR), full interaction model (FIM), information gain (IG), Bayesian epistasis association mapping (BEAM), SNP harvester (SH), maximum entropy conditional probability modeling (MECPM), logistic regression with an interaction term (LRIT), and logistic regression (LR) were compared on a large number of simulated data sets, each, consistent with complex disease models, embedding multiple sets of interacting SNPs, under different interaction models. The assessment criteria included several relevant detection power measures, family-wise type I error rate, and computational complexity. There are several important results from this study. First, while some SNPs in interactions with strong effects are successfully detected, most of the methods miss many interacting SNPs at an acceptable rate of false positives. In this study, the best-performing method was MECPM. Second, the statistical significance assessment criteria, used by some of the methods to control the type I error rate, are quite conservative, thereby limiting their power and making it difficult to fairly compare them. Third, as expected, power varies for different models and as a function of penetrance, minor allele frequency, linkage disequilibrium and marginal effects. Fourth, the analytical relationships between power and these factors are derived, aiding in the interpretation of the study results. Fifth, for these methods the magnitude of the main effect influences the power of the tests. Sixth, most methods
doi_str_mv 10.1186/1471-2164-12-344
format article
fullrecord <record><control><sourceid>gale_doaj_</sourceid><recordid>TN_cdi_doaj_primary_oai_doaj_org_article_4fe98a96e72a4ffaacd5b8b8a8983d19</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A265233036</galeid><doaj_id>oai_doaj_org_article_4fe98a96e72a4ffaacd5b8b8a8983d19</doaj_id><sourcerecordid>A265233036</sourcerecordid><originalsourceid>FETCH-LOGICAL-b622t-f2dad3943b01bd931332395922ede9df021fc7e2f60686c24b9c920be9052f9a3</originalsourceid><addsrcrecordid>eNp1kktr3DAQgE1padK0956KoYfSg1NpZGutSyAsfSwECn2cxVgabRRsaytpQ_Pv663TJYYUHSRmvvnQaFQUrzk757yVH3i94hVwWVccKlHXT4rTY-jpg_NJ8SKlG8b4qoXmeXECfAUKVHNaXKzDsMOI2d9SiSP2d8mnMrhyoHwdbCpdiKWlTCb7cVv6MVPE-dwH418Wzxz2iV7d72fFz08ff6y_VFdfP2_Wl1dVJwFy5cCiFaoWHeOdVYILAUI1CoAsKesYcGdWBE4y2UoDdaeMAtaRYg04heKs2MxeG_BG76IfMN7pgF7_DYS41RizNz3p2pFqUUlaAdbOIRrbdG3XYqtaYbmaXBeza7fvBrKGxhyxX0iXmdFf62241YJLzngzCdazoPPhP4JlxoRBH2ahD7PQHPQ0qsny7v4aMfzaU8p68MlQ3-NIYZ902zaNZEweyLczucWpPz-6MFnNgdaXIBsQggk5UeePUNOyNHgTRnJ-ii8K3i8KJibT77zFfUp68_3bkmUza2JIKZI7dsuZPnzFx_p78_CZjwX__p74A_pQ1_4</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>885560064</pqid></control><display><type>article</type><title>Comparative analysis of methods for detecting interacting loci</title><source>Publicly Available Content Database</source><source>PubMed Central</source><creator>Chen, Li ; Yu, Guoqiang ; Langefeld, Carl D ; Miller, David J ; Guy, Richard T ; Raghuram, Jayaram ; Yuan, Xiguo ; Herrington, David M ; Wang, Yue</creator><creatorcontrib>Chen, Li ; Yu, Guoqiang ; Langefeld, Carl D ; Miller, David J ; Guy, Richard T ; Raghuram, Jayaram ; Yuan, Xiguo ; Herrington, David M ; Wang, Yue</creatorcontrib><description>Interactions among genetic loci are believed to play an important role in disease risk. While many methods have been proposed for detecting such interactions, their relative performance remains largely unclear, mainly because different data sources, detection performance criteria, and experimental protocols were used in the papers introducing these methods and in subsequent studies. Moreover, there have been very few studies strictly focused on comparison of existing methods. Given the importance of detecting gene-gene and gene-environment interactions, a rigorous, comprehensive comparison of performance and limitations of available interaction detection methods is warranted. We report a comparison of eight representative methods, of which seven were specifically designed to detect interactions among single nucleotide polymorphisms (SNPs), with the last a popular main-effect testing method used as a baseline for performance evaluation. The selected methods, multifactor dimensionality reduction (MDR), full interaction model (FIM), information gain (IG), Bayesian epistasis association mapping (BEAM), SNP harvester (SH), maximum entropy conditional probability modeling (MECPM), logistic regression with an interaction term (LRIT), and logistic regression (LR) were compared on a large number of simulated data sets, each, consistent with complex disease models, embedding multiple sets of interacting SNPs, under different interaction models. The assessment criteria included several relevant detection power measures, family-wise type I error rate, and computational complexity. There are several important results from this study. First, while some SNPs in interactions with strong effects are successfully detected, most of the methods miss many interacting SNPs at an acceptable rate of false positives. In this study, the best-performing method was MECPM. Second, the statistical significance assessment criteria, used by some of the methods to control the type I error rate, are quite conservative, thereby limiting their power and making it difficult to fairly compare them. Third, as expected, power varies for different models and as a function of penetrance, minor allele frequency, linkage disequilibrium and marginal effects. Fourth, the analytical relationships between power and these factors are derived, aiding in the interpretation of the study results. Fifth, for these methods the magnitude of the main effect influences the power of the tests. Sixth, most methods can detect some ground-truth SNPs but have modest power to detect the whole set of interacting SNPs. This comparison study provides new insights into the strengths and limitations of current methods for detecting interacting loci. This study, along with freely available simulation tools we provide, should help support development of improved methods. The simulation tools are available at: http://code.google.com/p/simulation-tool-bmc-ms9169818735220977/downloads/list.</description><identifier>ISSN: 1471-2164</identifier><identifier>EISSN: 1471-2164</identifier><identifier>DOI: 10.1186/1471-2164-12-344</identifier><identifier>PMID: 21729295</identifier><language>eng</language><publisher>England: BioMed Central Ltd</publisher><subject>Bayes Theorem ; Computational Biology - methods ; Diseases ; DNA sequencing ; Epistasis, Genetic - genetics ; Genetic aspects ; Genetic Loci - genetics ; Humans ; Logistic Models ; Methodology ; Methods ; Multifactor Dimensionality Reduction ; Nucleotide sequencing ; Physiological aspects ; Polymorphism, Single Nucleotide - genetics ; Probability ; Reproducibility of Results ; ROC Curve ; Single nucleotide polymorphisms ; United States</subject><ispartof>BMC genomics, 2011-07, Vol.12 (1), p.344-344, Article 344</ispartof><rights>COPYRIGHT 2011 BioMed Central Ltd.</rights><rights>Copyright ©2011 Chen et al; licensee BioMed Central Ltd. 2011 Chen et al; licensee BioMed Central Ltd.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-b622t-f2dad3943b01bd931332395922ede9df021fc7e2f60686c24b9c920be9052f9a3</citedby><cites>FETCH-LOGICAL-b622t-f2dad3943b01bd931332395922ede9df021fc7e2f60686c24b9c920be9052f9a3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC3161015/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC3161015/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,727,780,784,885,27924,27925,37013,53791,53793</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/21729295$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Chen, Li</creatorcontrib><creatorcontrib>Yu, Guoqiang</creatorcontrib><creatorcontrib>Langefeld, Carl D</creatorcontrib><creatorcontrib>Miller, David J</creatorcontrib><creatorcontrib>Guy, Richard T</creatorcontrib><creatorcontrib>Raghuram, Jayaram</creatorcontrib><creatorcontrib>Yuan, Xiguo</creatorcontrib><creatorcontrib>Herrington, David M</creatorcontrib><creatorcontrib>Wang, Yue</creatorcontrib><title>Comparative analysis of methods for detecting interacting loci</title><title>BMC genomics</title><addtitle>BMC Genomics</addtitle><description>Interactions among genetic loci are believed to play an important role in disease risk. While many methods have been proposed for detecting such interactions, their relative performance remains largely unclear, mainly because different data sources, detection performance criteria, and experimental protocols were used in the papers introducing these methods and in subsequent studies. Moreover, there have been very few studies strictly focused on comparison of existing methods. Given the importance of detecting gene-gene and gene-environment interactions, a rigorous, comprehensive comparison of performance and limitations of available interaction detection methods is warranted. We report a comparison of eight representative methods, of which seven were specifically designed to detect interactions among single nucleotide polymorphisms (SNPs), with the last a popular main-effect testing method used as a baseline for performance evaluation. The selected methods, multifactor dimensionality reduction (MDR), full interaction model (FIM), information gain (IG), Bayesian epistasis association mapping (BEAM), SNP harvester (SH), maximum entropy conditional probability modeling (MECPM), logistic regression with an interaction term (LRIT), and logistic regression (LR) were compared on a large number of simulated data sets, each, consistent with complex disease models, embedding multiple sets of interacting SNPs, under different interaction models. The assessment criteria included several relevant detection power measures, family-wise type I error rate, and computational complexity. There are several important results from this study. First, while some SNPs in interactions with strong effects are successfully detected, most of the methods miss many interacting SNPs at an acceptable rate of false positives. In this study, the best-performing method was MECPM. Second, the statistical significance assessment criteria, used by some of the methods to control the type I error rate, are quite conservative, thereby limiting their power and making it difficult to fairly compare them. Third, as expected, power varies for different models and as a function of penetrance, minor allele frequency, linkage disequilibrium and marginal effects. Fourth, the analytical relationships between power and these factors are derived, aiding in the interpretation of the study results. Fifth, for these methods the magnitude of the main effect influences the power of the tests. Sixth, most methods can detect some ground-truth SNPs but have modest power to detect the whole set of interacting SNPs. This comparison study provides new insights into the strengths and limitations of current methods for detecting interacting loci. This study, along with freely available simulation tools we provide, should help support development of improved methods. The simulation tools are available at: http://code.google.com/p/simulation-tool-bmc-ms9169818735220977/downloads/list.</description><subject>Bayes Theorem</subject><subject>Computational Biology - methods</subject><subject>Diseases</subject><subject>DNA sequencing</subject><subject>Epistasis, Genetic - genetics</subject><subject>Genetic aspects</subject><subject>Genetic Loci - genetics</subject><subject>Humans</subject><subject>Logistic Models</subject><subject>Methodology</subject><subject>Methods</subject><subject>Multifactor Dimensionality Reduction</subject><subject>Nucleotide sequencing</subject><subject>Physiological aspects</subject><subject>Polymorphism, Single Nucleotide - genetics</subject><subject>Probability</subject><subject>Reproducibility of Results</subject><subject>ROC Curve</subject><subject>Single nucleotide polymorphisms</subject><subject>United States</subject><issn>1471-2164</issn><issn>1471-2164</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2011</creationdate><recordtype>article</recordtype><sourceid>DOA</sourceid><recordid>eNp1kktr3DAQgE1padK0956KoYfSg1NpZGutSyAsfSwECn2cxVgabRRsaytpQ_Pv663TJYYUHSRmvvnQaFQUrzk757yVH3i94hVwWVccKlHXT4rTY-jpg_NJ8SKlG8b4qoXmeXECfAUKVHNaXKzDsMOI2d9SiSP2d8mnMrhyoHwdbCpdiKWlTCb7cVv6MVPE-dwH418Wzxz2iV7d72fFz08ff6y_VFdfP2_Wl1dVJwFy5cCiFaoWHeOdVYILAUI1CoAsKesYcGdWBE4y2UoDdaeMAtaRYg04heKs2MxeG_BG76IfMN7pgF7_DYS41RizNz3p2pFqUUlaAdbOIRrbdG3XYqtaYbmaXBeza7fvBrKGxhyxX0iXmdFf62241YJLzngzCdazoPPhP4JlxoRBH2ahD7PQHPQ0qsny7v4aMfzaU8p68MlQ3-NIYZ902zaNZEweyLczucWpPz-6MFnNgdaXIBsQggk5UeePUNOyNHgTRnJ-ii8K3i8KJibT77zFfUp68_3bkmUza2JIKZI7dsuZPnzFx_p78_CZjwX__p74A_pQ1_4</recordid><startdate>20110705</startdate><enddate>20110705</enddate><creator>Chen, Li</creator><creator>Yu, Guoqiang</creator><creator>Langefeld, Carl D</creator><creator>Miller, David J</creator><creator>Guy, Richard T</creator><creator>Raghuram, Jayaram</creator><creator>Yuan, Xiguo</creator><creator>Herrington, David M</creator><creator>Wang, Yue</creator><general>BioMed Central Ltd</general><general>BioMed Central</general><general>BMC</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>ISR</scope><scope>7X8</scope><scope>5PM</scope><scope>DOA</scope></search><sort><creationdate>20110705</creationdate><title>Comparative analysis of methods for detecting interacting loci</title><author>Chen, Li ; Yu, Guoqiang ; Langefeld, Carl D ; Miller, David J ; Guy, Richard T ; Raghuram, Jayaram ; Yuan, Xiguo ; Herrington, David M ; Wang, Yue</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-b622t-f2dad3943b01bd931332395922ede9df021fc7e2f60686c24b9c920be9052f9a3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2011</creationdate><topic>Bayes Theorem</topic><topic>Computational Biology - methods</topic><topic>Diseases</topic><topic>DNA sequencing</topic><topic>Epistasis, Genetic - genetics</topic><topic>Genetic aspects</topic><topic>Genetic Loci - genetics</topic><topic>Humans</topic><topic>Logistic Models</topic><topic>Methodology</topic><topic>Methods</topic><topic>Multifactor Dimensionality Reduction</topic><topic>Nucleotide sequencing</topic><topic>Physiological aspects</topic><topic>Polymorphism, Single Nucleotide - genetics</topic><topic>Probability</topic><topic>Reproducibility of Results</topic><topic>ROC Curve</topic><topic>Single nucleotide polymorphisms</topic><topic>United States</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Chen, Li</creatorcontrib><creatorcontrib>Yu, Guoqiang</creatorcontrib><creatorcontrib>Langefeld, Carl D</creatorcontrib><creatorcontrib>Miller, David J</creatorcontrib><creatorcontrib>Guy, Richard T</creatorcontrib><creatorcontrib>Raghuram, Jayaram</creatorcontrib><creatorcontrib>Yuan, Xiguo</creatorcontrib><creatorcontrib>Herrington, David M</creatorcontrib><creatorcontrib>Wang, Yue</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Science (Gale in Context)</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><collection>Directory of Open Access Journals</collection><jtitle>BMC genomics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Chen, Li</au><au>Yu, Guoqiang</au><au>Langefeld, Carl D</au><au>Miller, David J</au><au>Guy, Richard T</au><au>Raghuram, Jayaram</au><au>Yuan, Xiguo</au><au>Herrington, David M</au><au>Wang, Yue</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Comparative analysis of methods for detecting interacting loci</atitle><jtitle>BMC genomics</jtitle><addtitle>BMC Genomics</addtitle><date>2011-07-05</date><risdate>2011</risdate><volume>12</volume><issue>1</issue><spage>344</spage><epage>344</epage><pages>344-344</pages><artnum>344</artnum><issn>1471-2164</issn><eissn>1471-2164</eissn><abstract>Interactions among genetic loci are believed to play an important role in disease risk. While many methods have been proposed for detecting such interactions, their relative performance remains largely unclear, mainly because different data sources, detection performance criteria, and experimental protocols were used in the papers introducing these methods and in subsequent studies. Moreover, there have been very few studies strictly focused on comparison of existing methods. Given the importance of detecting gene-gene and gene-environment interactions, a rigorous, comprehensive comparison of performance and limitations of available interaction detection methods is warranted. We report a comparison of eight representative methods, of which seven were specifically designed to detect interactions among single nucleotide polymorphisms (SNPs), with the last a popular main-effect testing method used as a baseline for performance evaluation. The selected methods, multifactor dimensionality reduction (MDR), full interaction model (FIM), information gain (IG), Bayesian epistasis association mapping (BEAM), SNP harvester (SH), maximum entropy conditional probability modeling (MECPM), logistic regression with an interaction term (LRIT), and logistic regression (LR) were compared on a large number of simulated data sets, each, consistent with complex disease models, embedding multiple sets of interacting SNPs, under different interaction models. The assessment criteria included several relevant detection power measures, family-wise type I error rate, and computational complexity. There are several important results from this study. First, while some SNPs in interactions with strong effects are successfully detected, most of the methods miss many interacting SNPs at an acceptable rate of false positives. In this study, the best-performing method was MECPM. Second, the statistical significance assessment criteria, used by some of the methods to control the type I error rate, are quite conservative, thereby limiting their power and making it difficult to fairly compare them. Third, as expected, power varies for different models and as a function of penetrance, minor allele frequency, linkage disequilibrium and marginal effects. Fourth, the analytical relationships between power and these factors are derived, aiding in the interpretation of the study results. Fifth, for these methods the magnitude of the main effect influences the power of the tests. Sixth, most methods can detect some ground-truth SNPs but have modest power to detect the whole set of interacting SNPs. This comparison study provides new insights into the strengths and limitations of current methods for detecting interacting loci. This study, along with freely available simulation tools we provide, should help support development of improved methods. The simulation tools are available at: http://code.google.com/p/simulation-tool-bmc-ms9169818735220977/downloads/list.</abstract><cop>England</cop><pub>BioMed Central Ltd</pub><pmid>21729295</pmid><doi>10.1186/1471-2164-12-344</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1471-2164
ispartof BMC genomics, 2011-07, Vol.12 (1), p.344-344, Article 344
issn 1471-2164
1471-2164
language eng
recordid cdi_doaj_primary_oai_doaj_org_article_4fe98a96e72a4ffaacd5b8b8a8983d19
source Publicly Available Content Database; PubMed Central
subjects Bayes Theorem
Computational Biology - methods
Diseases
DNA sequencing
Epistasis, Genetic - genetics
Genetic aspects
Genetic Loci - genetics
Humans
Logistic Models
Methodology
Methods
Multifactor Dimensionality Reduction
Nucleotide sequencing
Physiological aspects
Polymorphism, Single Nucleotide - genetics
Probability
Reproducibility of Results
ROC Curve
Single nucleotide polymorphisms
United States
title Comparative analysis of methods for detecting interacting loci
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-07T14%3A32%3A07IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_doaj_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Comparative%20analysis%20of%20methods%20for%20detecting%20interacting%20loci&rft.jtitle=BMC%20genomics&rft.au=Chen,%20Li&rft.date=2011-07-05&rft.volume=12&rft.issue=1&rft.spage=344&rft.epage=344&rft.pages=344-344&rft.artnum=344&rft.issn=1471-2164&rft.eissn=1471-2164&rft_id=info:doi/10.1186/1471-2164-12-344&rft_dat=%3Cgale_doaj_%3EA265233036%3C/gale_doaj_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-b622t-f2dad3943b01bd931332395922ede9df021fc7e2f60686c24b9c920be9052f9a3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=885560064&rft_id=info:pmid/21729295&rft_galeid=A265233036&rfr_iscdi=true