Loading…

Clustering using objective functions and stochastic search

A new approach to clustering multivariate data, based on a multilevel linear mixed model, is proposed. A key feature of the model is that observations from the same cluster are correlated, because they share cluster-specific random effects. The inclusion of cluster-specific random effects allows par...

Full description

Saved in:
Bibliographic Details
Published in:Journal of the Royal Statistical Society. Series B, Statistical methodology Statistical methodology, 2008-02, Vol.70 (1), p.119-139
Main Authors: Booth, James G., Casella, George, Hobert, James P.
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c5789-e22ae8004d7b29e3a51f69f66e213170b7d5f8ddd78273bf172ee6fb0ab5f85a3
cites cdi_FETCH-LOGICAL-c5789-e22ae8004d7b29e3a51f69f66e213170b7d5f8ddd78273bf172ee6fb0ab5f85a3
container_end_page 139
container_issue 1
container_start_page 119
container_title Journal of the Royal Statistical Society. Series B, Statistical methodology
container_volume 70
creator Booth, James G.
Casella, George
Hobert, James P.
description A new approach to clustering multivariate data, based on a multilevel linear mixed model, is proposed. A key feature of the model is that observations from the same cluster are correlated, because they share cluster-specific random effects. The inclusion of cluster-specific random effects allows parsimonious departure from an assumed base model for cluster mean profiles. This departure is captured statistically via the posterior expectation, or best linear unbiased predictor. One of the parameters in the model is the true underlying partition of the data, and the posterior distribution of this parameter, which is known up to a normalizing constant, is used to cluster the data. The problem of finding partitions with high posterior probability is not amenable to deterministic methods such as the EM algorithm. Thus, we propose a stochastic search algorithm that is driven by a Markov chain that is a mixture of two Metropolis-Hastings algorithms-one that makes small scale changes to individual objects and another that performs large scale moves involving entire clusters. The methodology proposed is fundamentally different from the well-known finite mixture model approach to clustering, which does not explicitly include the partition as a parameter, and involves an independent and identically distributed structure.
doi_str_mv 10.1111/j.1467-9868.2007.00629.x
format article
fullrecord <record><control><sourceid>jstor_proqu</sourceid><recordid>TN_cdi_proquest_miscellaneous_36839843</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><jstor_id>20203814</jstor_id><sourcerecordid>20203814</sourcerecordid><originalsourceid>FETCH-LOGICAL-c5789-e22ae8004d7b29e3a51f69f66e213170b7d5f8ddd78273bf172ee6fb0ab5f85a3</originalsourceid><addsrcrecordid>eNqNUttu1DAQjRCVKIVPQIqQ4C3Bl8QXJB7oCgqoainl8jhyHId1yCaLnbS7f8-kqVaIJ0Yaz8hzzmh8PEmSUpJTtFdtTgshM62EyhkhMidEMJ3vHiTHh8JDzLnQmSwoe5Q8jrElaELy4-T1qpvi6ILvf6ZTnM-hap0d_Y1Lm6nHZOhjavo6jeNg1yaO3qbRmWDXT5KjxnTRPb2PJ8m39---rj5k55dnH1dvzzNbSqUzx5hxipCilhXTjpuSNkI3QjhGOZWkknXZqLqupWKSVw2VzDnRVMRUeF8afpK8XPpuw_B7cnGEjY_WdZ3p3TBF4EJxrQqOwOf_ANthCj3OBqiMZpyTEkFqAdkwxBhcA9vgNybsgRKYFYUWZuFgFm7mSbhTFHZI_bRQg9s6e-BVnWmHEGMFN8CNJHjs0ZGqMHh0ir6dI9VAuYb1uMFmL-6HNdGargmmtz4emiKbUaLnR71ZcLe-c_v_Hha-XF-fYob8Zwu_xQ8Mf_VnhCtaYD1b6h7XYHeom_ALcEFkCT8uzkBekSv2-fsFcP4HL4y6ug</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>200923305</pqid></control><display><type>article</type><title>Clustering using objective functions and stochastic search</title><source>International Bibliography of the Social Sciences (IBSS)</source><source>Business Source Ultimate【Trial: -2024/12/31】【Remote access available】</source><source>JSTOR Archival Journals and Primary Sources Collection</source><source>Alma/SFX Local Collection</source><creator>Booth, James G. ; Casella, George ; Hobert, James P.</creator><creatorcontrib>Booth, James G. ; Casella, George ; Hobert, James P.</creatorcontrib><description>A new approach to clustering multivariate data, based on a multilevel linear mixed model, is proposed. A key feature of the model is that observations from the same cluster are correlated, because they share cluster-specific random effects. The inclusion of cluster-specific random effects allows parsimonious departure from an assumed base model for cluster mean profiles. This departure is captured statistically via the posterior expectation, or best linear unbiased predictor. One of the parameters in the model is the true underlying partition of the data, and the posterior distribution of this parameter, which is known up to a normalizing constant, is used to cluster the data. The problem of finding partitions with high posterior probability is not amenable to deterministic methods such as the EM algorithm. Thus, we propose a stochastic search algorithm that is driven by a Markov chain that is a mixture of two Metropolis-Hastings algorithms-one that makes small scale changes to individual objects and another that performs large scale moves involving entire clusters. The methodology proposed is fundamentally different from the well-known finite mixture model approach to clustering, which does not explicitly include the partition as a parameter, and involves an independent and identically distributed structure.</description><identifier>ISSN: 1369-7412</identifier><identifier>EISSN: 1467-9868</identifier><identifier>DOI: 10.1111/j.1467-9868.2007.00629.x</identifier><language>eng</language><publisher>Oxford, UK: Blackwell Publishing Ltd</publisher><subject>Algorithms ; Bayesian method ; Bayesian model ; Best linear unbiased predictor ; Biostatistics ; Cell cycle ; Cluster analysis ; Distribution theory ; Exact sciences and technology ; General topics ; Hastings algorithm ; Linear mixed model ; Markov analysis ; Markov chain Monte Carlo methods ; Markov chains ; Mathematical vectors ; Mathematics ; Metropolis ; Microarray ; Modeling ; Monte Carlo simulation ; Multivariate analysis ; Objective functions ; Parametric inference ; Parametric models ; Probability and statistics ; Probability theory and stochastic processes ; Quadratic penalized splines ; Random walk ; Sciences and techniques of general use ; Set partition ; Statistical methods ; Statistics ; Studies ; Yeast cell cycle ; Yeasts</subject><ispartof>Journal of the Royal Statistical Society. Series B, Statistical methodology, 2008-02, Vol.70 (1), p.119-139</ispartof><rights>Copyright 2008 The Royal Statistical Society and Blackwell Publishing Ltd.</rights><rights>2008 INIST-CNRS</rights><rights>2008 Royal Statistical Society</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c5789-e22ae8004d7b29e3a51f69f66e213170b7d5f8ddd78273bf172ee6fb0ab5f85a3</citedby><cites>FETCH-LOGICAL-c5789-e22ae8004d7b29e3a51f69f66e213170b7d5f8ddd78273bf172ee6fb0ab5f85a3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.jstor.org/stable/pdf/20203814$$EPDF$$P50$$Gjstor$$H</linktopdf><linktohtml>$$Uhttps://www.jstor.org/stable/20203814$$EHTML$$P50$$Gjstor$$H</linktohtml><link.rule.ids>314,780,784,27924,27925,33223,33224,58238,58471</link.rule.ids><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&amp;idt=20021093$$DView record in Pascal Francis$$Hfree_for_read</backlink><backlink>$$Uhttp://econpapers.repec.org/article/blajorssb/v_3a70_3ay_3a2008_3ai_3a1_3ap_3a119-139.htm$$DView record in RePEc$$Hfree_for_read</backlink></links><search><creatorcontrib>Booth, James G.</creatorcontrib><creatorcontrib>Casella, George</creatorcontrib><creatorcontrib>Hobert, James P.</creatorcontrib><title>Clustering using objective functions and stochastic search</title><title>Journal of the Royal Statistical Society. Series B, Statistical methodology</title><description>A new approach to clustering multivariate data, based on a multilevel linear mixed model, is proposed. A key feature of the model is that observations from the same cluster are correlated, because they share cluster-specific random effects. The inclusion of cluster-specific random effects allows parsimonious departure from an assumed base model for cluster mean profiles. This departure is captured statistically via the posterior expectation, or best linear unbiased predictor. One of the parameters in the model is the true underlying partition of the data, and the posterior distribution of this parameter, which is known up to a normalizing constant, is used to cluster the data. The problem of finding partitions with high posterior probability is not amenable to deterministic methods such as the EM algorithm. Thus, we propose a stochastic search algorithm that is driven by a Markov chain that is a mixture of two Metropolis-Hastings algorithms-one that makes small scale changes to individual objects and another that performs large scale moves involving entire clusters. The methodology proposed is fundamentally different from the well-known finite mixture model approach to clustering, which does not explicitly include the partition as a parameter, and involves an independent and identically distributed structure.</description><subject>Algorithms</subject><subject>Bayesian method</subject><subject>Bayesian model</subject><subject>Best linear unbiased predictor</subject><subject>Biostatistics</subject><subject>Cell cycle</subject><subject>Cluster analysis</subject><subject>Distribution theory</subject><subject>Exact sciences and technology</subject><subject>General topics</subject><subject>Hastings algorithm</subject><subject>Linear mixed model</subject><subject>Markov analysis</subject><subject>Markov chain Monte Carlo methods</subject><subject>Markov chains</subject><subject>Mathematical vectors</subject><subject>Mathematics</subject><subject>Metropolis</subject><subject>Microarray</subject><subject>Modeling</subject><subject>Monte Carlo simulation</subject><subject>Multivariate analysis</subject><subject>Objective functions</subject><subject>Parametric inference</subject><subject>Parametric models</subject><subject>Probability and statistics</subject><subject>Probability theory and stochastic processes</subject><subject>Quadratic penalized splines</subject><subject>Random walk</subject><subject>Sciences and techniques of general use</subject><subject>Set partition</subject><subject>Statistical methods</subject><subject>Statistics</subject><subject>Studies</subject><subject>Yeast cell cycle</subject><subject>Yeasts</subject><issn>1369-7412</issn><issn>1467-9868</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2008</creationdate><recordtype>article</recordtype><sourceid>8BJ</sourceid><recordid>eNqNUttu1DAQjRCVKIVPQIqQ4C3Bl8QXJB7oCgqoainl8jhyHId1yCaLnbS7f8-kqVaIJ0Yaz8hzzmh8PEmSUpJTtFdtTgshM62EyhkhMidEMJ3vHiTHh8JDzLnQmSwoe5Q8jrElaELy4-T1qpvi6ILvf6ZTnM-hap0d_Y1Lm6nHZOhjavo6jeNg1yaO3qbRmWDXT5KjxnTRPb2PJ8m39---rj5k55dnH1dvzzNbSqUzx5hxipCilhXTjpuSNkI3QjhGOZWkknXZqLqupWKSVw2VzDnRVMRUeF8afpK8XPpuw_B7cnGEjY_WdZ3p3TBF4EJxrQqOwOf_ANthCj3OBqiMZpyTEkFqAdkwxBhcA9vgNybsgRKYFYUWZuFgFm7mSbhTFHZI_bRQg9s6e-BVnWmHEGMFN8CNJHjs0ZGqMHh0ir6dI9VAuYb1uMFmL-6HNdGargmmtz4emiKbUaLnR71ZcLe-c_v_Hha-XF-fYob8Zwu_xQ8Mf_VnhCtaYD1b6h7XYHeom_ALcEFkCT8uzkBekSv2-fsFcP4HL4y6ug</recordid><startdate>200802</startdate><enddate>200802</enddate><creator>Booth, James G.</creator><creator>Casella, George</creator><creator>Hobert, James P.</creator><general>Blackwell Publishing Ltd</general><general>Blackwell Publishing</general><general>Blackwell</general><general>Royal Statistical Society</general><general>Oxford University Press</general><scope>BSCLL</scope><scope>IQODW</scope><scope>DKI</scope><scope>X2L</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8BJ</scope><scope>8FD</scope><scope>FQK</scope><scope>JBE</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>200802</creationdate><title>Clustering using objective functions and stochastic search</title><author>Booth, James G. ; Casella, George ; Hobert, James P.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c5789-e22ae8004d7b29e3a51f69f66e213170b7d5f8ddd78273bf172ee6fb0ab5f85a3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2008</creationdate><topic>Algorithms</topic><topic>Bayesian method</topic><topic>Bayesian model</topic><topic>Best linear unbiased predictor</topic><topic>Biostatistics</topic><topic>Cell cycle</topic><topic>Cluster analysis</topic><topic>Distribution theory</topic><topic>Exact sciences and technology</topic><topic>General topics</topic><topic>Hastings algorithm</topic><topic>Linear mixed model</topic><topic>Markov analysis</topic><topic>Markov chain Monte Carlo methods</topic><topic>Markov chains</topic><topic>Mathematical vectors</topic><topic>Mathematics</topic><topic>Metropolis</topic><topic>Microarray</topic><topic>Modeling</topic><topic>Monte Carlo simulation</topic><topic>Multivariate analysis</topic><topic>Objective functions</topic><topic>Parametric inference</topic><topic>Parametric models</topic><topic>Probability and statistics</topic><topic>Probability theory and stochastic processes</topic><topic>Quadratic penalized splines</topic><topic>Random walk</topic><topic>Sciences and techniques of general use</topic><topic>Set partition</topic><topic>Statistical methods</topic><topic>Statistics</topic><topic>Studies</topic><topic>Yeast cell cycle</topic><topic>Yeasts</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Booth, James G.</creatorcontrib><creatorcontrib>Casella, George</creatorcontrib><creatorcontrib>Hobert, James P.</creatorcontrib><collection>Istex</collection><collection>Pascal-Francis</collection><collection>RePEc IDEAS</collection><collection>RePEc</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>International Bibliography of the Social Sciences (IBSS)</collection><collection>Technology Research Database</collection><collection>International Bibliography of the Social Sciences</collection><collection>International Bibliography of the Social Sciences</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Journal of the Royal Statistical Society. Series B, Statistical methodology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Booth, James G.</au><au>Casella, George</au><au>Hobert, James P.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Clustering using objective functions and stochastic search</atitle><jtitle>Journal of the Royal Statistical Society. Series B, Statistical methodology</jtitle><date>2008-02</date><risdate>2008</risdate><volume>70</volume><issue>1</issue><spage>119</spage><epage>139</epage><pages>119-139</pages><issn>1369-7412</issn><eissn>1467-9868</eissn><abstract>A new approach to clustering multivariate data, based on a multilevel linear mixed model, is proposed. A key feature of the model is that observations from the same cluster are correlated, because they share cluster-specific random effects. The inclusion of cluster-specific random effects allows parsimonious departure from an assumed base model for cluster mean profiles. This departure is captured statistically via the posterior expectation, or best linear unbiased predictor. One of the parameters in the model is the true underlying partition of the data, and the posterior distribution of this parameter, which is known up to a normalizing constant, is used to cluster the data. The problem of finding partitions with high posterior probability is not amenable to deterministic methods such as the EM algorithm. Thus, we propose a stochastic search algorithm that is driven by a Markov chain that is a mixture of two Metropolis-Hastings algorithms-one that makes small scale changes to individual objects and another that performs large scale moves involving entire clusters. The methodology proposed is fundamentally different from the well-known finite mixture model approach to clustering, which does not explicitly include the partition as a parameter, and involves an independent and identically distributed structure.</abstract><cop>Oxford, UK</cop><pub>Blackwell Publishing Ltd</pub><doi>10.1111/j.1467-9868.2007.00629.x</doi><tpages>21</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1369-7412
ispartof Journal of the Royal Statistical Society. Series B, Statistical methodology, 2008-02, Vol.70 (1), p.119-139
issn 1369-7412
1467-9868
language eng
recordid cdi_proquest_miscellaneous_36839843
source International Bibliography of the Social Sciences (IBSS); Business Source Ultimate【Trial: -2024/12/31】【Remote access available】; JSTOR Archival Journals and Primary Sources Collection; Alma/SFX Local Collection
subjects Algorithms
Bayesian method
Bayesian model
Best linear unbiased predictor
Biostatistics
Cell cycle
Cluster analysis
Distribution theory
Exact sciences and technology
General topics
Hastings algorithm
Linear mixed model
Markov analysis
Markov chain Monte Carlo methods
Markov chains
Mathematical vectors
Mathematics
Metropolis
Microarray
Modeling
Monte Carlo simulation
Multivariate analysis
Objective functions
Parametric inference
Parametric models
Probability and statistics
Probability theory and stochastic processes
Quadratic penalized splines
Random walk
Sciences and techniques of general use
Set partition
Statistical methods
Statistics
Studies
Yeast cell cycle
Yeasts
title Clustering using objective functions and stochastic search
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-29T13%3A53%3A11IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-jstor_proqu&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Clustering%20using%20objective%20functions%20and%20stochastic%20search&rft.jtitle=Journal%20of%20the%20Royal%20Statistical%20Society.%20Series%20B,%20Statistical%20methodology&rft.au=Booth,%20James%20G.&rft.date=2008-02&rft.volume=70&rft.issue=1&rft.spage=119&rft.epage=139&rft.pages=119-139&rft.issn=1369-7412&rft.eissn=1467-9868&rft_id=info:doi/10.1111/j.1467-9868.2007.00629.x&rft_dat=%3Cjstor_proqu%3E20203814%3C/jstor_proqu%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c5789-e22ae8004d7b29e3a51f69f66e213170b7d5f8ddd78273bf172ee6fb0ab5f85a3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=200923305&rft_id=info:pmid/&rft_jstor_id=20203814&rfr_iscdi=true