Loading…
Clustering using objective functions and stochastic search
A new approach to clustering multivariate data, based on a multilevel linear mixed model, is proposed. A key feature of the model is that observations from the same cluster are correlated, because they share cluster-specific random effects. The inclusion of cluster-specific random effects allows par...
Saved in:
Published in: | Journal of the Royal Statistical Society. Series B, Statistical methodology Statistical methodology, 2008-02, Vol.70 (1), p.119-139 |
---|---|
Main Authors: | , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | cdi_FETCH-LOGICAL-c5789-e22ae8004d7b29e3a51f69f66e213170b7d5f8ddd78273bf172ee6fb0ab5f85a3 |
---|---|
cites | cdi_FETCH-LOGICAL-c5789-e22ae8004d7b29e3a51f69f66e213170b7d5f8ddd78273bf172ee6fb0ab5f85a3 |
container_end_page | 139 |
container_issue | 1 |
container_start_page | 119 |
container_title | Journal of the Royal Statistical Society. Series B, Statistical methodology |
container_volume | 70 |
creator | Booth, James G. Casella, George Hobert, James P. |
description | A new approach to clustering multivariate data, based on a multilevel linear mixed model, is proposed. A key feature of the model is that observations from the same cluster are correlated, because they share cluster-specific random effects. The inclusion of cluster-specific random effects allows parsimonious departure from an assumed base model for cluster mean profiles. This departure is captured statistically via the posterior expectation, or best linear unbiased predictor. One of the parameters in the model is the true underlying partition of the data, and the posterior distribution of this parameter, which is known up to a normalizing constant, is used to cluster the data. The problem of finding partitions with high posterior probability is not amenable to deterministic methods such as the EM algorithm. Thus, we propose a stochastic search algorithm that is driven by a Markov chain that is a mixture of two Metropolis-Hastings algorithms-one that makes small scale changes to individual objects and another that performs large scale moves involving entire clusters. The methodology proposed is fundamentally different from the well-known finite mixture model approach to clustering, which does not explicitly include the partition as a parameter, and involves an independent and identically distributed structure. |
doi_str_mv | 10.1111/j.1467-9868.2007.00629.x |
format | article |
fullrecord | <record><control><sourceid>jstor_proqu</sourceid><recordid>TN_cdi_proquest_miscellaneous_36839843</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><jstor_id>20203814</jstor_id><sourcerecordid>20203814</sourcerecordid><originalsourceid>FETCH-LOGICAL-c5789-e22ae8004d7b29e3a51f69f66e213170b7d5f8ddd78273bf172ee6fb0ab5f85a3</originalsourceid><addsrcrecordid>eNqNUttu1DAQjRCVKIVPQIqQ4C3Bl8QXJB7oCgqoainl8jhyHId1yCaLnbS7f8-kqVaIJ0Yaz8hzzmh8PEmSUpJTtFdtTgshM62EyhkhMidEMJ3vHiTHh8JDzLnQmSwoe5Q8jrElaELy4-T1qpvi6ILvf6ZTnM-hap0d_Y1Lm6nHZOhjavo6jeNg1yaO3qbRmWDXT5KjxnTRPb2PJ8m39---rj5k55dnH1dvzzNbSqUzx5hxipCilhXTjpuSNkI3QjhGOZWkknXZqLqupWKSVw2VzDnRVMRUeF8afpK8XPpuw_B7cnGEjY_WdZ3p3TBF4EJxrQqOwOf_ANthCj3OBqiMZpyTEkFqAdkwxBhcA9vgNybsgRKYFYUWZuFgFm7mSbhTFHZI_bRQg9s6e-BVnWmHEGMFN8CNJHjs0ZGqMHh0ir6dI9VAuYb1uMFmL-6HNdGargmmtz4emiKbUaLnR71ZcLe-c_v_Hha-XF-fYob8Zwu_xQ8Mf_VnhCtaYD1b6h7XYHeom_ALcEFkCT8uzkBekSv2-fsFcP4HL4y6ug</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>200923305</pqid></control><display><type>article</type><title>Clustering using objective functions and stochastic search</title><source>International Bibliography of the Social Sciences (IBSS)</source><source>Business Source Ultimate【Trial: -2024/12/31】【Remote access available】</source><source>JSTOR Archival Journals and Primary Sources Collection</source><source>Alma/SFX Local Collection</source><creator>Booth, James G. ; Casella, George ; Hobert, James P.</creator><creatorcontrib>Booth, James G. ; Casella, George ; Hobert, James P.</creatorcontrib><description>A new approach to clustering multivariate data, based on a multilevel linear mixed model, is proposed. A key feature of the model is that observations from the same cluster are correlated, because they share cluster-specific random effects. The inclusion of cluster-specific random effects allows parsimonious departure from an assumed base model for cluster mean profiles. This departure is captured statistically via the posterior expectation, or best linear unbiased predictor. One of the parameters in the model is the true underlying partition of the data, and the posterior distribution of this parameter, which is known up to a normalizing constant, is used to cluster the data. The problem of finding partitions with high posterior probability is not amenable to deterministic methods such as the EM algorithm. Thus, we propose a stochastic search algorithm that is driven by a Markov chain that is a mixture of two Metropolis-Hastings algorithms-one that makes small scale changes to individual objects and another that performs large scale moves involving entire clusters. The methodology proposed is fundamentally different from the well-known finite mixture model approach to clustering, which does not explicitly include the partition as a parameter, and involves an independent and identically distributed structure.</description><identifier>ISSN: 1369-7412</identifier><identifier>EISSN: 1467-9868</identifier><identifier>DOI: 10.1111/j.1467-9868.2007.00629.x</identifier><language>eng</language><publisher>Oxford, UK: Blackwell Publishing Ltd</publisher><subject>Algorithms ; Bayesian method ; Bayesian model ; Best linear unbiased predictor ; Biostatistics ; Cell cycle ; Cluster analysis ; Distribution theory ; Exact sciences and technology ; General topics ; Hastings algorithm ; Linear mixed model ; Markov analysis ; Markov chain Monte Carlo methods ; Markov chains ; Mathematical vectors ; Mathematics ; Metropolis ; Microarray ; Modeling ; Monte Carlo simulation ; Multivariate analysis ; Objective functions ; Parametric inference ; Parametric models ; Probability and statistics ; Probability theory and stochastic processes ; Quadratic penalized splines ; Random walk ; Sciences and techniques of general use ; Set partition ; Statistical methods ; Statistics ; Studies ; Yeast cell cycle ; Yeasts</subject><ispartof>Journal of the Royal Statistical Society. Series B, Statistical methodology, 2008-02, Vol.70 (1), p.119-139</ispartof><rights>Copyright 2008 The Royal Statistical Society and Blackwell Publishing Ltd.</rights><rights>2008 INIST-CNRS</rights><rights>2008 Royal Statistical Society</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c5789-e22ae8004d7b29e3a51f69f66e213170b7d5f8ddd78273bf172ee6fb0ab5f85a3</citedby><cites>FETCH-LOGICAL-c5789-e22ae8004d7b29e3a51f69f66e213170b7d5f8ddd78273bf172ee6fb0ab5f85a3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.jstor.org/stable/pdf/20203814$$EPDF$$P50$$Gjstor$$H</linktopdf><linktohtml>$$Uhttps://www.jstor.org/stable/20203814$$EHTML$$P50$$Gjstor$$H</linktohtml><link.rule.ids>314,780,784,27924,27925,33223,33224,58238,58471</link.rule.ids><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=20021093$$DView record in Pascal Francis$$Hfree_for_read</backlink><backlink>$$Uhttp://econpapers.repec.org/article/blajorssb/v_3a70_3ay_3a2008_3ai_3a1_3ap_3a119-139.htm$$DView record in RePEc$$Hfree_for_read</backlink></links><search><creatorcontrib>Booth, James G.</creatorcontrib><creatorcontrib>Casella, George</creatorcontrib><creatorcontrib>Hobert, James P.</creatorcontrib><title>Clustering using objective functions and stochastic search</title><title>Journal of the Royal Statistical Society. Series B, Statistical methodology</title><description>A new approach to clustering multivariate data, based on a multilevel linear mixed model, is proposed. A key feature of the model is that observations from the same cluster are correlated, because they share cluster-specific random effects. The inclusion of cluster-specific random effects allows parsimonious departure from an assumed base model for cluster mean profiles. This departure is captured statistically via the posterior expectation, or best linear unbiased predictor. One of the parameters in the model is the true underlying partition of the data, and the posterior distribution of this parameter, which is known up to a normalizing constant, is used to cluster the data. The problem of finding partitions with high posterior probability is not amenable to deterministic methods such as the EM algorithm. Thus, we propose a stochastic search algorithm that is driven by a Markov chain that is a mixture of two Metropolis-Hastings algorithms-one that makes small scale changes to individual objects and another that performs large scale moves involving entire clusters. The methodology proposed is fundamentally different from the well-known finite mixture model approach to clustering, which does not explicitly include the partition as a parameter, and involves an independent and identically distributed structure.</description><subject>Algorithms</subject><subject>Bayesian method</subject><subject>Bayesian model</subject><subject>Best linear unbiased predictor</subject><subject>Biostatistics</subject><subject>Cell cycle</subject><subject>Cluster analysis</subject><subject>Distribution theory</subject><subject>Exact sciences and technology</subject><subject>General topics</subject><subject>Hastings algorithm</subject><subject>Linear mixed model</subject><subject>Markov analysis</subject><subject>Markov chain Monte Carlo methods</subject><subject>Markov chains</subject><subject>Mathematical vectors</subject><subject>Mathematics</subject><subject>Metropolis</subject><subject>Microarray</subject><subject>Modeling</subject><subject>Monte Carlo simulation</subject><subject>Multivariate analysis</subject><subject>Objective functions</subject><subject>Parametric inference</subject><subject>Parametric models</subject><subject>Probability and statistics</subject><subject>Probability theory and stochastic processes</subject><subject>Quadratic penalized splines</subject><subject>Random walk</subject><subject>Sciences and techniques of general use</subject><subject>Set partition</subject><subject>Statistical methods</subject><subject>Statistics</subject><subject>Studies</subject><subject>Yeast cell cycle</subject><subject>Yeasts</subject><issn>1369-7412</issn><issn>1467-9868</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2008</creationdate><recordtype>article</recordtype><sourceid>8BJ</sourceid><recordid>eNqNUttu1DAQjRCVKIVPQIqQ4C3Bl8QXJB7oCgqoainl8jhyHId1yCaLnbS7f8-kqVaIJ0Yaz8hzzmh8PEmSUpJTtFdtTgshM62EyhkhMidEMJ3vHiTHh8JDzLnQmSwoe5Q8jrElaELy4-T1qpvi6ILvf6ZTnM-hap0d_Y1Lm6nHZOhjavo6jeNg1yaO3qbRmWDXT5KjxnTRPb2PJ8m39---rj5k55dnH1dvzzNbSqUzx5hxipCilhXTjpuSNkI3QjhGOZWkknXZqLqupWKSVw2VzDnRVMRUeF8afpK8XPpuw_B7cnGEjY_WdZ3p3TBF4EJxrQqOwOf_ANthCj3OBqiMZpyTEkFqAdkwxBhcA9vgNybsgRKYFYUWZuFgFm7mSbhTFHZI_bRQg9s6e-BVnWmHEGMFN8CNJHjs0ZGqMHh0ir6dI9VAuYb1uMFmL-6HNdGargmmtz4emiKbUaLnR71ZcLe-c_v_Hha-XF-fYob8Zwu_xQ8Mf_VnhCtaYD1b6h7XYHeom_ALcEFkCT8uzkBekSv2-fsFcP4HL4y6ug</recordid><startdate>200802</startdate><enddate>200802</enddate><creator>Booth, James G.</creator><creator>Casella, George</creator><creator>Hobert, James P.</creator><general>Blackwell Publishing Ltd</general><general>Blackwell Publishing</general><general>Blackwell</general><general>Royal Statistical Society</general><general>Oxford University Press</general><scope>BSCLL</scope><scope>IQODW</scope><scope>DKI</scope><scope>X2L</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8BJ</scope><scope>8FD</scope><scope>FQK</scope><scope>JBE</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>200802</creationdate><title>Clustering using objective functions and stochastic search</title><author>Booth, James G. ; Casella, George ; Hobert, James P.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c5789-e22ae8004d7b29e3a51f69f66e213170b7d5f8ddd78273bf172ee6fb0ab5f85a3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2008</creationdate><topic>Algorithms</topic><topic>Bayesian method</topic><topic>Bayesian model</topic><topic>Best linear unbiased predictor</topic><topic>Biostatistics</topic><topic>Cell cycle</topic><topic>Cluster analysis</topic><topic>Distribution theory</topic><topic>Exact sciences and technology</topic><topic>General topics</topic><topic>Hastings algorithm</topic><topic>Linear mixed model</topic><topic>Markov analysis</topic><topic>Markov chain Monte Carlo methods</topic><topic>Markov chains</topic><topic>Mathematical vectors</topic><topic>Mathematics</topic><topic>Metropolis</topic><topic>Microarray</topic><topic>Modeling</topic><topic>Monte Carlo simulation</topic><topic>Multivariate analysis</topic><topic>Objective functions</topic><topic>Parametric inference</topic><topic>Parametric models</topic><topic>Probability and statistics</topic><topic>Probability theory and stochastic processes</topic><topic>Quadratic penalized splines</topic><topic>Random walk</topic><topic>Sciences and techniques of general use</topic><topic>Set partition</topic><topic>Statistical methods</topic><topic>Statistics</topic><topic>Studies</topic><topic>Yeast cell cycle</topic><topic>Yeasts</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Booth, James G.</creatorcontrib><creatorcontrib>Casella, George</creatorcontrib><creatorcontrib>Hobert, James P.</creatorcontrib><collection>Istex</collection><collection>Pascal-Francis</collection><collection>RePEc IDEAS</collection><collection>RePEc</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>International Bibliography of the Social Sciences (IBSS)</collection><collection>Technology Research Database</collection><collection>International Bibliography of the Social Sciences</collection><collection>International Bibliography of the Social Sciences</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Journal of the Royal Statistical Society. Series B, Statistical methodology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Booth, James G.</au><au>Casella, George</au><au>Hobert, James P.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Clustering using objective functions and stochastic search</atitle><jtitle>Journal of the Royal Statistical Society. Series B, Statistical methodology</jtitle><date>2008-02</date><risdate>2008</risdate><volume>70</volume><issue>1</issue><spage>119</spage><epage>139</epage><pages>119-139</pages><issn>1369-7412</issn><eissn>1467-9868</eissn><abstract>A new approach to clustering multivariate data, based on a multilevel linear mixed model, is proposed. A key feature of the model is that observations from the same cluster are correlated, because they share cluster-specific random effects. The inclusion of cluster-specific random effects allows parsimonious departure from an assumed base model for cluster mean profiles. This departure is captured statistically via the posterior expectation, or best linear unbiased predictor. One of the parameters in the model is the true underlying partition of the data, and the posterior distribution of this parameter, which is known up to a normalizing constant, is used to cluster the data. The problem of finding partitions with high posterior probability is not amenable to deterministic methods such as the EM algorithm. Thus, we propose a stochastic search algorithm that is driven by a Markov chain that is a mixture of two Metropolis-Hastings algorithms-one that makes small scale changes to individual objects and another that performs large scale moves involving entire clusters. The methodology proposed is fundamentally different from the well-known finite mixture model approach to clustering, which does not explicitly include the partition as a parameter, and involves an independent and identically distributed structure.</abstract><cop>Oxford, UK</cop><pub>Blackwell Publishing Ltd</pub><doi>10.1111/j.1467-9868.2007.00629.x</doi><tpages>21</tpages><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1369-7412 |
ispartof | Journal of the Royal Statistical Society. Series B, Statistical methodology, 2008-02, Vol.70 (1), p.119-139 |
issn | 1369-7412 1467-9868 |
language | eng |
recordid | cdi_proquest_miscellaneous_36839843 |
source | International Bibliography of the Social Sciences (IBSS); Business Source Ultimate【Trial: -2024/12/31】【Remote access available】; JSTOR Archival Journals and Primary Sources Collection; Alma/SFX Local Collection |
subjects | Algorithms Bayesian method Bayesian model Best linear unbiased predictor Biostatistics Cell cycle Cluster analysis Distribution theory Exact sciences and technology General topics Hastings algorithm Linear mixed model Markov analysis Markov chain Monte Carlo methods Markov chains Mathematical vectors Mathematics Metropolis Microarray Modeling Monte Carlo simulation Multivariate analysis Objective functions Parametric inference Parametric models Probability and statistics Probability theory and stochastic processes Quadratic penalized splines Random walk Sciences and techniques of general use Set partition Statistical methods Statistics Studies Yeast cell cycle Yeasts |
title | Clustering using objective functions and stochastic search |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-29T13%3A53%3A11IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-jstor_proqu&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Clustering%20using%20objective%20functions%20and%20stochastic%20search&rft.jtitle=Journal%20of%20the%20Royal%20Statistical%20Society.%20Series%20B,%20Statistical%20methodology&rft.au=Booth,%20James%20G.&rft.date=2008-02&rft.volume=70&rft.issue=1&rft.spage=119&rft.epage=139&rft.pages=119-139&rft.issn=1369-7412&rft.eissn=1467-9868&rft_id=info:doi/10.1111/j.1467-9868.2007.00629.x&rft_dat=%3Cjstor_proqu%3E20203814%3C/jstor_proqu%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c5789-e22ae8004d7b29e3a51f69f66e213170b7d5f8ddd78273bf172ee6fb0ab5f85a3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=200923305&rft_id=info:pmid/&rft_jstor_id=20203814&rfr_iscdi=true |