Loading…

Estimating Bayesian Phylogenetic Information Content

Measuring the phylogenetic information content of data has a long history in systematics. Here we explore a Bayesian approach to information content estimation. The entropy of the posterior distribution compared with the entropy of the prior distribution provides a natural way to measure information...

Full description

Saved in:
Bibliographic Details
Published in:Systematic biology 2016-11, Vol.65 (6), p.1009-1023
Main Authors: Lewis, Paul O., Chen, Ming-Hui, Kuo, Lynn, Lewis, Louise A., Fučíková, Karolina, Neupane, Suman, Wang, Yu-Bo, Shi, Daoyuan
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c470t-277797c12d8f1d8a2586ec13ee8e8f86eb347d6eaedea739ef27b88ba71f2d9a3
cites cdi_FETCH-LOGICAL-c470t-277797c12d8f1d8a2586ec13ee8e8f86eb347d6eaedea739ef27b88ba71f2d9a3
container_end_page 1023
container_issue 6
container_start_page 1009
container_title Systematic biology
container_volume 65
creator Lewis, Paul O.
Chen, Ming-Hui
Kuo, Lynn
Lewis, Louise A.
Fučíková, Karolina
Neupane, Suman
Wang, Yu-Bo
Shi, Daoyuan
description Measuring the phylogenetic information content of data has a long history in systematics. Here we explore a Bayesian approach to information content estimation. The entropy of the posterior distribution compared with the entropy of the prior distribution provides a natural way to measure information content. If the data have no information relevant to ranking tree topologies beyond the information supplied by the prior, the posterior and prior will be identical. Information in data discourages consideration of some hypotheses allowed by the prior, resulting in a posterior distribution that is more concentrated (has lower entropy) than the prior. We focus on measuring information about tree topology using marginal posterior distributions of tree topologies. We show that both the accuracy and the computational efficiency of topological information content estimation improve with use of the conditional clade distribution, which also allows topological information content to be partitioned by clade. We explore two important applications of our method: providing a compelling definition of saturation and detecting conflict among data partitions that can negatively affect analyses of concatenated data.
doi_str_mv 10.1093/sysbio/syw042
format article
fullrecord <record><control><sourceid>jstor_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_5066063</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><jstor_id>44028816</jstor_id><oup_id>10.1093/sysbio/syw042</oup_id><sourcerecordid>44028816</sourcerecordid><originalsourceid>FETCH-LOGICAL-c470t-277797c12d8f1d8a2586ec13ee8e8f86eb347d6eaedea739ef27b88ba71f2d9a3</originalsourceid><addsrcrecordid>eNqFkUtLw0AUhQdRrK-lS6Xgxk10HplHNoKW-gBBFwruhkly06akM3UmUfrvnRKtj42re-F-nHsOB6FDgs8Izth5WIa8dnG845RuoB2CpUgUEy-bq12whBMuB2g3hBnGhAhOttGASsI5xmoHpePQ1nPT1nYyvDJLCLWxw8fpsnETsNDWxfDOVs6vCGeHI2dbsO0-2qpME-Dgc-6h5-vx0-g2uX-4uRtd3idFKnGbUCllJgtCS1WRUhnKlYCCMAAFqop7zlJZCjBQgpEsg4rKXKncSFLRMjNsD130uosun0NZxNfeNHrho2O_1M7U-vfF1lM9cW-aYyFi9ihw-ing3WsHodXzOhTQNMaC64Imigohs0xlET35g85c522MFynGeZqlmEcq6anCuxA8VGszBOtVH7rvQ_d9RP74Z4I1_VXAt0PXLf7VOurRWWidX8NpiqlSRLAPumKhow</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1835549405</pqid></control><display><type>article</type><title>Estimating Bayesian Phylogenetic Information Content</title><source>JSTOR Archival Journals and Primary Sources Collection</source><source>Oxford Journals Online</source><creator>Lewis, Paul O. ; Chen, Ming-Hui ; Kuo, Lynn ; Lewis, Louise A. ; Fučíková, Karolina ; Neupane, Suman ; Wang, Yu-Bo ; Shi, Daoyuan</creator><creatorcontrib>Lewis, Paul O. ; Chen, Ming-Hui ; Kuo, Lynn ; Lewis, Louise A. ; Fučíková, Karolina ; Neupane, Suman ; Wang, Yu-Bo ; Shi, Daoyuan</creatorcontrib><description>Measuring the phylogenetic information content of data has a long history in systematics. Here we explore a Bayesian approach to information content estimation. The entropy of the posterior distribution compared with the entropy of the prior distribution provides a natural way to measure information content. If the data have no information relevant to ranking tree topologies beyond the information supplied by the prior, the posterior and prior will be identical. Information in data discourages consideration of some hypotheses allowed by the prior, resulting in a posterior distribution that is more concentrated (has lower entropy) than the prior. We focus on measuring information about tree topology using marginal posterior distributions of tree topologies. We show that both the accuracy and the computational efficiency of topological information content estimation improve with use of the conditional clade distribution, which also allows topological information content to be partitioned by clade. We explore two important applications of our method: providing a compelling definition of saturation and detecting conflict among data partitions that can negatively affect analyses of concatenated data.</description><identifier>ISSN: 1063-5157</identifier><identifier>EISSN: 1076-836X</identifier><identifier>DOI: 10.1093/sysbio/syw042</identifier><identifier>PMID: 27155008</identifier><language>eng</language><publisher>England: Oxford University Press</publisher><subject>Bayes Theorem ; Bayesian analysis ; Classification - methods ; Conditional probabilities ; Data files ; Datasets ; Decision trees ; Entropy ; Estimating techniques ; Information content ; Models, Genetic ; Phylogenetics ; Phylogeny ; Regular ; Systematics ; Taxa ; Topology</subject><ispartof>Systematic biology, 2016-11, Vol.65 (6), p.1009-1023</ispartof><rights>Copyright © 2016 Society of Systematic Biologists</rights><rights>The Author(s) 2016. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. 2016</rights><rights>The Author(s) 2016. Published by Oxford University Press, on behalf of the Society of Systematic Biologists.</rights><rights>Copyright Oxford University Press, UK Nov 2016</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c470t-277797c12d8f1d8a2586ec13ee8e8f86eb347d6eaedea739ef27b88ba71f2d9a3</citedby><cites>FETCH-LOGICAL-c470t-277797c12d8f1d8a2586ec13ee8e8f86eb347d6eaedea739ef27b88ba71f2d9a3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.jstor.org/stable/pdf/44028816$$EPDF$$P50$$Gjstor$$H</linktopdf><linktohtml>$$Uhttps://www.jstor.org/stable/44028816$$EHTML$$P50$$Gjstor$$H</linktohtml><link.rule.ids>230,314,776,780,881,27901,27902,58213,58446</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/27155008$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Lewis, Paul O.</creatorcontrib><creatorcontrib>Chen, Ming-Hui</creatorcontrib><creatorcontrib>Kuo, Lynn</creatorcontrib><creatorcontrib>Lewis, Louise A.</creatorcontrib><creatorcontrib>Fučíková, Karolina</creatorcontrib><creatorcontrib>Neupane, Suman</creatorcontrib><creatorcontrib>Wang, Yu-Bo</creatorcontrib><creatorcontrib>Shi, Daoyuan</creatorcontrib><title>Estimating Bayesian Phylogenetic Information Content</title><title>Systematic biology</title><addtitle>Syst Biol</addtitle><description>Measuring the phylogenetic information content of data has a long history in systematics. Here we explore a Bayesian approach to information content estimation. The entropy of the posterior distribution compared with the entropy of the prior distribution provides a natural way to measure information content. If the data have no information relevant to ranking tree topologies beyond the information supplied by the prior, the posterior and prior will be identical. Information in data discourages consideration of some hypotheses allowed by the prior, resulting in a posterior distribution that is more concentrated (has lower entropy) than the prior. We focus on measuring information about tree topology using marginal posterior distributions of tree topologies. We show that both the accuracy and the computational efficiency of topological information content estimation improve with use of the conditional clade distribution, which also allows topological information content to be partitioned by clade. We explore two important applications of our method: providing a compelling definition of saturation and detecting conflict among data partitions that can negatively affect analyses of concatenated data.</description><subject>Bayes Theorem</subject><subject>Bayesian analysis</subject><subject>Classification - methods</subject><subject>Conditional probabilities</subject><subject>Data files</subject><subject>Datasets</subject><subject>Decision trees</subject><subject>Entropy</subject><subject>Estimating techniques</subject><subject>Information content</subject><subject>Models, Genetic</subject><subject>Phylogenetics</subject><subject>Phylogeny</subject><subject>Regular</subject><subject>Systematics</subject><subject>Taxa</subject><subject>Topology</subject><issn>1063-5157</issn><issn>1076-836X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2016</creationdate><recordtype>article</recordtype><sourceid>TOX</sourceid><recordid>eNqFkUtLw0AUhQdRrK-lS6Xgxk10HplHNoKW-gBBFwruhkly06akM3UmUfrvnRKtj42re-F-nHsOB6FDgs8Izth5WIa8dnG845RuoB2CpUgUEy-bq12whBMuB2g3hBnGhAhOttGASsI5xmoHpePQ1nPT1nYyvDJLCLWxw8fpsnETsNDWxfDOVs6vCGeHI2dbsO0-2qpME-Dgc-6h5-vx0-g2uX-4uRtd3idFKnGbUCllJgtCS1WRUhnKlYCCMAAFqop7zlJZCjBQgpEsg4rKXKncSFLRMjNsD130uosun0NZxNfeNHrho2O_1M7U-vfF1lM9cW-aYyFi9ihw-ing3WsHodXzOhTQNMaC64Imigohs0xlET35g85c522MFynGeZqlmEcq6anCuxA8VGszBOtVH7rvQ_d9RP74Z4I1_VXAt0PXLf7VOurRWWidX8NpiqlSRLAPumKhow</recordid><startdate>20161101</startdate><enddate>20161101</enddate><creator>Lewis, Paul O.</creator><creator>Chen, Ming-Hui</creator><creator>Kuo, Lynn</creator><creator>Lewis, Louise A.</creator><creator>Fučíková, Karolina</creator><creator>Neupane, Suman</creator><creator>Wang, Yu-Bo</creator><creator>Shi, Daoyuan</creator><general>Oxford University Press</general><scope>TOX</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>K9.</scope><scope>7X8</scope><scope>5PM</scope></search><sort><creationdate>20161101</creationdate><title>Estimating Bayesian Phylogenetic Information Content</title><author>Lewis, Paul O. ; Chen, Ming-Hui ; Kuo, Lynn ; Lewis, Louise A. ; Fučíková, Karolina ; Neupane, Suman ; Wang, Yu-Bo ; Shi, Daoyuan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c470t-277797c12d8f1d8a2586ec13ee8e8f86eb347d6eaedea739ef27b88ba71f2d9a3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2016</creationdate><topic>Bayes Theorem</topic><topic>Bayesian analysis</topic><topic>Classification - methods</topic><topic>Conditional probabilities</topic><topic>Data files</topic><topic>Datasets</topic><topic>Decision trees</topic><topic>Entropy</topic><topic>Estimating techniques</topic><topic>Information content</topic><topic>Models, Genetic</topic><topic>Phylogenetics</topic><topic>Phylogeny</topic><topic>Regular</topic><topic>Systematics</topic><topic>Taxa</topic><topic>Topology</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Lewis, Paul O.</creatorcontrib><creatorcontrib>Chen, Ming-Hui</creatorcontrib><creatorcontrib>Kuo, Lynn</creatorcontrib><creatorcontrib>Lewis, Louise A.</creatorcontrib><creatorcontrib>Fučíková, Karolina</creatorcontrib><creatorcontrib>Neupane, Suman</creatorcontrib><creatorcontrib>Wang, Yu-Bo</creatorcontrib><creatorcontrib>Shi, Daoyuan</creatorcontrib><collection>Oxford Open</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>ProQuest Health &amp; Medical Complete (Alumni)</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Systematic biology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Lewis, Paul O.</au><au>Chen, Ming-Hui</au><au>Kuo, Lynn</au><au>Lewis, Louise A.</au><au>Fučíková, Karolina</au><au>Neupane, Suman</au><au>Wang, Yu-Bo</au><au>Shi, Daoyuan</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Estimating Bayesian Phylogenetic Information Content</atitle><jtitle>Systematic biology</jtitle><addtitle>Syst Biol</addtitle><date>2016-11-01</date><risdate>2016</risdate><volume>65</volume><issue>6</issue><spage>1009</spage><epage>1023</epage><pages>1009-1023</pages><issn>1063-5157</issn><eissn>1076-836X</eissn><abstract>Measuring the phylogenetic information content of data has a long history in systematics. Here we explore a Bayesian approach to information content estimation. The entropy of the posterior distribution compared with the entropy of the prior distribution provides a natural way to measure information content. If the data have no information relevant to ranking tree topologies beyond the information supplied by the prior, the posterior and prior will be identical. Information in data discourages consideration of some hypotheses allowed by the prior, resulting in a posterior distribution that is more concentrated (has lower entropy) than the prior. We focus on measuring information about tree topology using marginal posterior distributions of tree topologies. We show that both the accuracy and the computational efficiency of topological information content estimation improve with use of the conditional clade distribution, which also allows topological information content to be partitioned by clade. We explore two important applications of our method: providing a compelling definition of saturation and detecting conflict among data partitions that can negatively affect analyses of concatenated data.</abstract><cop>England</cop><pub>Oxford University Press</pub><pmid>27155008</pmid><doi>10.1093/sysbio/syw042</doi><tpages>15</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1063-5157
ispartof Systematic biology, 2016-11, Vol.65 (6), p.1009-1023
issn 1063-5157
1076-836X
language eng
recordid cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_5066063
source JSTOR Archival Journals and Primary Sources Collection; Oxford Journals Online
subjects Bayes Theorem
Bayesian analysis
Classification - methods
Conditional probabilities
Data files
Datasets
Decision trees
Entropy
Estimating techniques
Information content
Models, Genetic
Phylogenetics
Phylogeny
Regular
Systematics
Taxa
Topology
title Estimating Bayesian Phylogenetic Information Content
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-31T02%3A49%3A57IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-jstor_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Estimating%20Bayesian%20Phylogenetic%20Information%20Content&rft.jtitle=Systematic%20biology&rft.au=Lewis,%20Paul%20O.&rft.date=2016-11-01&rft.volume=65&rft.issue=6&rft.spage=1009&rft.epage=1023&rft.pages=1009-1023&rft.issn=1063-5157&rft.eissn=1076-836X&rft_id=info:doi/10.1093/sysbio/syw042&rft_dat=%3Cjstor_pubme%3E44028816%3C/jstor_pubme%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c470t-277797c12d8f1d8a2586ec13ee8e8f86eb347d6eaedea739ef27b88ba71f2d9a3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=1835549405&rft_id=info:pmid/27155008&rft_jstor_id=44028816&rft_oup_id=10.1093/sysbio/syw042&rfr_iscdi=true