Loading…

EM in high-dimensional spaces

This paper considers fitting a mixture of Gaussians model to high-dimensional data in scenarios where there are fewer data samples than feature dimensions. Issues that arise when using principal component analysis (PCA) to represent Gaussian distributions inside Expectation-Maximization (EM) are add...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on cybernetics 2005-06, Vol.35 (3), p.571-577
Main Authors: Draper, B.A., Elliott, D.L., Hayes, J., Kyungim Baek
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c483t-80f9cd5f8260e8d308dc447ed681f41149c2d4f6da40c7e6da44042f55602aae3
cites cdi_FETCH-LOGICAL-c483t-80f9cd5f8260e8d308dc447ed681f41149c2d4f6da40c7e6da44042f55602aae3
container_end_page 577
container_issue 3
container_start_page 571
container_title IEEE transactions on cybernetics
container_volume 35
creator Draper, B.A.
Elliott, D.L.
Hayes, J.
Kyungim Baek
description This paper considers fitting a mixture of Gaussians model to high-dimensional data in scenarios where there are fewer data samples than feature dimensions. Issues that arise when using principal component analysis (PCA) to represent Gaussian distributions inside Expectation-Maximization (EM) are addressed, and a practical algorithm results. Unlike other algorithms that have been proposed, this algorithm does not try to compress the data to fit low-dimensional models. Instead, it models Gaussian distributions in the (N-1)-dimensional space spanned by the N data samples. We are able to show that this algorithm converges on data sets where low-dimensional techniques do not.
doi_str_mv 10.1109/TSMCB.2005.846670
format article
fullrecord <record><control><sourceid>proquest_ieee_</sourceid><recordid>TN_cdi_proquest_miscellaneous_896233823</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>1430841</ieee_id><sourcerecordid>2352223831</sourcerecordid><originalsourceid>FETCH-LOGICAL-c483t-80f9cd5f8260e8d308dc447ed681f41149c2d4f6da40c7e6da44042f55602aae3</originalsourceid><addsrcrecordid>eNqNkU1Lw0AQhhdRbK3-ABGleNBT6uxmsh9HLfUDWjxYz8u62diUpKnZ5uC_d2MKBQ_qYZiBfeZ9l3kJOaUwohTUzfxlNr4bMYBkJJFzAXukTxXSCFCx_TCDjCNEqnrkyPslAChQ4pD0aKIEVSzpk_PJbJivhov8fRGleelWPq9Wphj6tbHOH5ODzBTenWz7gLzeT-bjx2j6_PA0vp1GFmW8iSRkyqZJJhkHJ9MYZGoRhUu5pBlSisqyFDOeGgQrXNsRkGVJwoEZ4-IBue5013X10Ti_0WXurSsKs3JV47VUnMWxDDUgV7-SXChOFed_gqzVFIL-A4RgLlvryx_gsmrqcKzwPy64SEC2trSDbF15X7tMr-u8NPWnpqDb0PR3aLoNTXehhZ2LrXDzVrp0t7FNKQBnHZA753bPGA6NNP4Ceh2XMA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>867675086</pqid></control><display><type>article</type><title>EM in high-dimensional spaces</title><source>IEEE Xplore (Online service)</source><creator>Draper, B.A. ; Elliott, D.L. ; Hayes, J. ; Kyungim Baek</creator><creatorcontrib>Draper, B.A. ; Elliott, D.L. ; Hayes, J. ; Kyungim Baek</creatorcontrib><description>This paper considers fitting a mixture of Gaussians model to high-dimensional data in scenarios where there are fewer data samples than feature dimensions. Issues that arise when using principal component analysis (PCA) to represent Gaussian distributions inside Expectation-Maximization (EM) are addressed, and a practical algorithm results. Unlike other algorithms that have been proposed, this algorithm does not try to compress the data to fit low-dimensional models. Instead, it models Gaussian distributions in the (N-1)-dimensional space spanned by the N data samples. We are able to show that this algorithm converges on data sets where low-dimensional techniques do not.</description><identifier>ISSN: 1083-4419</identifier><identifier>ISSN: 2168-2267</identifier><identifier>EISSN: 1941-0492</identifier><identifier>EISSN: 2168-2275</identifier><identifier>DOI: 10.1109/TSMCB.2005.846670</identifier><identifier>PMID: 15971925</identifier><identifier>CODEN: ITSCFI</identifier><language>eng</language><publisher>United States: IEEE</publisher><subject>Algorithms ; Artificial Intelligence ; Cluster Analysis ; Clustering algorithms ; Computer Simulation ; Covariance matrix ; Cybernetics ; Eigenvalues and eigenfunctions ; Expectation-Maximization ; Fittings ; Gaussian ; Gaussian distribution ; Gaussian processes ; image classification ; Image coding ; Image converters ; Image Enhancement - methods ; Image Interpretation, Computer-Assisted - methods ; Information Storage and Retrieval - methods ; Likelihood Functions ; Maximum likelihood estimation ; Models, Biological ; Models, Statistical ; Normal distribution ; Pattern Recognition, Automated - methods ; Pixel ; Principal Component Analysis ; Reproducibility of Results ; Sensitivity and Specificity ; Signal Processing, Computer-Assisted ; Subtraction Technique ; unsupervised learning</subject><ispartof>IEEE transactions on cybernetics, 2005-06, Vol.35 (3), p.571-577</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2005</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c483t-80f9cd5f8260e8d308dc447ed681f41149c2d4f6da40c7e6da44042f55602aae3</citedby><cites>FETCH-LOGICAL-c483t-80f9cd5f8260e8d308dc447ed681f41149c2d4f6da40c7e6da44042f55602aae3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/1430841$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,27903,27904,54774</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/15971925$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Draper, B.A.</creatorcontrib><creatorcontrib>Elliott, D.L.</creatorcontrib><creatorcontrib>Hayes, J.</creatorcontrib><creatorcontrib>Kyungim Baek</creatorcontrib><title>EM in high-dimensional spaces</title><title>IEEE transactions on cybernetics</title><addtitle>TSMCB</addtitle><addtitle>IEEE Trans Syst Man Cybern B Cybern</addtitle><description>This paper considers fitting a mixture of Gaussians model to high-dimensional data in scenarios where there are fewer data samples than feature dimensions. Issues that arise when using principal component analysis (PCA) to represent Gaussian distributions inside Expectation-Maximization (EM) are addressed, and a practical algorithm results. Unlike other algorithms that have been proposed, this algorithm does not try to compress the data to fit low-dimensional models. Instead, it models Gaussian distributions in the (N-1)-dimensional space spanned by the N data samples. We are able to show that this algorithm converges on data sets where low-dimensional techniques do not.</description><subject>Algorithms</subject><subject>Artificial Intelligence</subject><subject>Cluster Analysis</subject><subject>Clustering algorithms</subject><subject>Computer Simulation</subject><subject>Covariance matrix</subject><subject>Cybernetics</subject><subject>Eigenvalues and eigenfunctions</subject><subject>Expectation-Maximization</subject><subject>Fittings</subject><subject>Gaussian</subject><subject>Gaussian distribution</subject><subject>Gaussian processes</subject><subject>image classification</subject><subject>Image coding</subject><subject>Image converters</subject><subject>Image Enhancement - methods</subject><subject>Image Interpretation, Computer-Assisted - methods</subject><subject>Information Storage and Retrieval - methods</subject><subject>Likelihood Functions</subject><subject>Maximum likelihood estimation</subject><subject>Models, Biological</subject><subject>Models, Statistical</subject><subject>Normal distribution</subject><subject>Pattern Recognition, Automated - methods</subject><subject>Pixel</subject><subject>Principal Component Analysis</subject><subject>Reproducibility of Results</subject><subject>Sensitivity and Specificity</subject><subject>Signal Processing, Computer-Assisted</subject><subject>Subtraction Technique</subject><subject>unsupervised learning</subject><issn>1083-4419</issn><issn>2168-2267</issn><issn>1941-0492</issn><issn>2168-2275</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2005</creationdate><recordtype>article</recordtype><recordid>eNqNkU1Lw0AQhhdRbK3-ABGleNBT6uxmsh9HLfUDWjxYz8u62diUpKnZ5uC_d2MKBQ_qYZiBfeZ9l3kJOaUwohTUzfxlNr4bMYBkJJFzAXukTxXSCFCx_TCDjCNEqnrkyPslAChQ4pD0aKIEVSzpk_PJbJivhov8fRGleelWPq9Wphj6tbHOH5ODzBTenWz7gLzeT-bjx2j6_PA0vp1GFmW8iSRkyqZJJhkHJ9MYZGoRhUu5pBlSisqyFDOeGgQrXNsRkGVJwoEZ4-IBue5013X10Ti_0WXurSsKs3JV47VUnMWxDDUgV7-SXChOFed_gqzVFIL-A4RgLlvryx_gsmrqcKzwPy64SEC2trSDbF15X7tMr-u8NPWnpqDb0PR3aLoNTXehhZ2LrXDzVrp0t7FNKQBnHZA753bPGA6NNP4Ceh2XMA</recordid><startdate>20050601</startdate><enddate>20050601</enddate><creator>Draper, B.A.</creator><creator>Elliott, D.L.</creator><creator>Hayes, J.</creator><creator>Kyungim Baek</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7TB</scope><scope>8FD</scope><scope>F28</scope><scope>FR3</scope><scope>H8D</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>7X8</scope></search><sort><creationdate>20050601</creationdate><title>EM in high-dimensional spaces</title><author>Draper, B.A. ; Elliott, D.L. ; Hayes, J. ; Kyungim Baek</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c483t-80f9cd5f8260e8d308dc447ed681f41149c2d4f6da40c7e6da44042f55602aae3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2005</creationdate><topic>Algorithms</topic><topic>Artificial Intelligence</topic><topic>Cluster Analysis</topic><topic>Clustering algorithms</topic><topic>Computer Simulation</topic><topic>Covariance matrix</topic><topic>Cybernetics</topic><topic>Eigenvalues and eigenfunctions</topic><topic>Expectation-Maximization</topic><topic>Fittings</topic><topic>Gaussian</topic><topic>Gaussian distribution</topic><topic>Gaussian processes</topic><topic>image classification</topic><topic>Image coding</topic><topic>Image converters</topic><topic>Image Enhancement - methods</topic><topic>Image Interpretation, Computer-Assisted - methods</topic><topic>Information Storage and Retrieval - methods</topic><topic>Likelihood Functions</topic><topic>Maximum likelihood estimation</topic><topic>Models, Biological</topic><topic>Models, Statistical</topic><topic>Normal distribution</topic><topic>Pattern Recognition, Automated - methods</topic><topic>Pixel</topic><topic>Principal Component Analysis</topic><topic>Reproducibility of Results</topic><topic>Sensitivity and Specificity</topic><topic>Signal Processing, Computer-Assisted</topic><topic>Subtraction Technique</topic><topic>unsupervised learning</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Draper, B.A.</creatorcontrib><creatorcontrib>Elliott, D.L.</creatorcontrib><creatorcontrib>Hayes, J.</creatorcontrib><creatorcontrib>Kyungim Baek</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005–Present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Xplore</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Mechanical &amp; Transportation Engineering Abstracts</collection><collection>Technology Research Database</collection><collection>ANTE: Abstracts in New Technology &amp; Engineering</collection><collection>Engineering Research Database</collection><collection>Aerospace Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>MEDLINE - Academic</collection><jtitle>IEEE transactions on cybernetics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Draper, B.A.</au><au>Elliott, D.L.</au><au>Hayes, J.</au><au>Kyungim Baek</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>EM in high-dimensional spaces</atitle><jtitle>IEEE transactions on cybernetics</jtitle><stitle>TSMCB</stitle><addtitle>IEEE Trans Syst Man Cybern B Cybern</addtitle><date>2005-06-01</date><risdate>2005</risdate><volume>35</volume><issue>3</issue><spage>571</spage><epage>577</epage><pages>571-577</pages><issn>1083-4419</issn><issn>2168-2267</issn><eissn>1941-0492</eissn><eissn>2168-2275</eissn><coden>ITSCFI</coden><abstract>This paper considers fitting a mixture of Gaussians model to high-dimensional data in scenarios where there are fewer data samples than feature dimensions. Issues that arise when using principal component analysis (PCA) to represent Gaussian distributions inside Expectation-Maximization (EM) are addressed, and a practical algorithm results. Unlike other algorithms that have been proposed, this algorithm does not try to compress the data to fit low-dimensional models. Instead, it models Gaussian distributions in the (N-1)-dimensional space spanned by the N data samples. We are able to show that this algorithm converges on data sets where low-dimensional techniques do not.</abstract><cop>United States</cop><pub>IEEE</pub><pmid>15971925</pmid><doi>10.1109/TSMCB.2005.846670</doi><tpages>7</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1083-4419
ispartof IEEE transactions on cybernetics, 2005-06, Vol.35 (3), p.571-577
issn 1083-4419
2168-2267
1941-0492
2168-2275
language eng
recordid cdi_proquest_miscellaneous_896233823
source IEEE Xplore (Online service)
subjects Algorithms
Artificial Intelligence
Cluster Analysis
Clustering algorithms
Computer Simulation
Covariance matrix
Cybernetics
Eigenvalues and eigenfunctions
Expectation-Maximization
Fittings
Gaussian
Gaussian distribution
Gaussian processes
image classification
Image coding
Image converters
Image Enhancement - methods
Image Interpretation, Computer-Assisted - methods
Information Storage and Retrieval - methods
Likelihood Functions
Maximum likelihood estimation
Models, Biological
Models, Statistical
Normal distribution
Pattern Recognition, Automated - methods
Pixel
Principal Component Analysis
Reproducibility of Results
Sensitivity and Specificity
Signal Processing, Computer-Assisted
Subtraction Technique
unsupervised learning
title EM in high-dimensional spaces
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-24T22%3A54%3A47IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_ieee_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=EM%20in%20high-dimensional%20spaces&rft.jtitle=IEEE%20transactions%20on%20cybernetics&rft.au=Draper,%20B.A.&rft.date=2005-06-01&rft.volume=35&rft.issue=3&rft.spage=571&rft.epage=577&rft.pages=571-577&rft.issn=1083-4419&rft.eissn=1941-0492&rft.coden=ITSCFI&rft_id=info:doi/10.1109/TSMCB.2005.846670&rft_dat=%3Cproquest_ieee_%3E2352223831%3C/proquest_ieee_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c483t-80f9cd5f8260e8d308dc447ed681f41149c2d4f6da40c7e6da44042f55602aae3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=867675086&rft_id=info:pmid/15971925&rft_ieee_id=1430841&rfr_iscdi=true