Loading…

CUR matrix decompositions for improved data analysis

Principal components analysis and, more generally, the Singular Value Decomposition are fundamental data analysis tools that express a data matrix in terms of a sequence of orthogonal or uncorrelated vectors of decreasing importance. Unfortunately, being linear combinations of up to all the data poi...

Full description

Saved in:
Bibliographic Details
Published in:Proceedings of the National Academy of Sciences - PNAS 2009-01, Vol.106 (3), p.697-702
Main Authors: Mahoney, Michael W, Drineas, Petros
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c561t-8947c9821103ba84b8ed5e353222861803361f2416613bb60c8252d03b00df233
cites cdi_FETCH-LOGICAL-c561t-8947c9821103ba84b8ed5e353222861803361f2416613bb60c8252d03b00df233
container_end_page 702
container_issue 3
container_start_page 697
container_title Proceedings of the National Academy of Sciences - PNAS
container_volume 106
creator Mahoney, Michael W
Drineas, Petros
description Principal components analysis and, more generally, the Singular Value Decomposition are fundamental data analysis tools that express a data matrix in terms of a sequence of orthogonal or uncorrelated vectors of decreasing importance. Unfortunately, being linear combinations of up to all the data points, these vectors are notoriously difficult to interpret in terms of the data and processes generating the data. In this article, we develop CUR matrix decompositions for improved data analysis. CUR decompositions are low-rank matrix decompositions that are explicitly expressed in terms of a small number of actual columns and/or actual rows of the data matrix. Because they are constructed from actual data elements, CUR decompositions are interpretable by practitioners of the field from which the data are drawn (to the extent that the original data are). We present an algorithm that preferentially chooses columns and rows that exhibit high "statistical leverage" and, thus, in a very precise statistical sense, exert a disproportionately large "influence" on the best low-rank fit of the data matrix. By selecting columns and rows in this manner, we obtain improved relative-error and constant-factor approximation guarantees in worst-case analysis, as opposed to the much coarser additive-error guarantees of prior work. In addition, since the construction involves computing quantities with a natural and widely studied statistical interpretation, we can leverage ideas from diagnostic regression analysis to employ these matrix decompositions for exploratory data analysis.
doi_str_mv 10.1073/pnas.0803205106
format article
fullrecord <record><control><sourceid>proquest_fao_a</sourceid><recordid>TN_cdi_fao_agris_US201301585162</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>66836796</sourcerecordid><originalsourceid>FETCH-LOGICAL-c561t-8947c9821103ba84b8ed5e353222861803361f2416613bb60c8252d03b00df233</originalsourceid><addsrcrecordid>eNptkc1L7DAUxYM80fFj7UpfceGuem_SpslGkMEvEAR11iFtU420TV_SDvrf23EGRx-u7uL-7uGcewg5QDhFyNhZ1-pwCgIYhRSBb5AJgsSYJxL-kAkAzWKR0GSb7ITwCgAyFbBFtlEik0zSCUmms4eo0b23b1FpCtd0LtjeujZElfORbTrv5qaMSt3rSLe6fg827JHNStfB7K_mLpldXT5Nb-K7--vb6cVdXKQc-1jIJCukoIjAci2SXJgyNSxllFLBcXTNOFY0Qc6R5TmHQtCUliMMUFaUsV1yvtTthrwxZWHa3utadd422r8rp636uWnti3p2c0U5AwQYBU5WAt79G0zoVWNDYepat8YNQXEuGM8kH8Hj_8BXN_gxblAUkGUgPqGzJVR4F4I31ZcTBLWoQy3qUOs6xouj7wHW_Or_I3C4AhaXazmumOIy-xbg172qhrruzVs_gn-XYKWd0s_eBjV7XFgHTEWKnLIPs1WkwA</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>201370896</pqid></control><display><type>article</type><title>CUR matrix decompositions for improved data analysis</title><source>JSTOR Archival Journals and Primary Sources Collection</source><source>PubMed Central</source><creator>Mahoney, Michael W ; Drineas, Petros</creator><creatorcontrib>Mahoney, Michael W ; Drineas, Petros</creatorcontrib><description>Principal components analysis and, more generally, the Singular Value Decomposition are fundamental data analysis tools that express a data matrix in terms of a sequence of orthogonal or uncorrelated vectors of decreasing importance. Unfortunately, being linear combinations of up to all the data points, these vectors are notoriously difficult to interpret in terms of the data and processes generating the data. In this article, we develop CUR matrix decompositions for improved data analysis. CUR decompositions are low-rank matrix decompositions that are explicitly expressed in terms of a small number of actual columns and/or actual rows of the data matrix. Because they are constructed from actual data elements, CUR decompositions are interpretable by practitioners of the field from which the data are drawn (to the extent that the original data are). We present an algorithm that preferentially chooses columns and rows that exhibit high "statistical leverage" and, thus, in a very precise statistical sense, exert a disproportionately large "influence" on the best low-rank fit of the data matrix. By selecting columns and rows in this manner, we obtain improved relative-error and constant-factor approximation guarantees in worst-case analysis, as opposed to the much coarser additive-error guarantees of prior work. In addition, since the construction involves computing quantities with a natural and widely studied statistical interpretation, we can leverage ideas from diagnostic regression analysis to employ these matrix decompositions for exploratory data analysis.</description><identifier>ISSN: 0027-8424</identifier><identifier>EISSN: 1091-6490</identifier><identifier>DOI: 10.1073/pnas.0803205106</identifier><identifier>PMID: 19139392</identifier><language>eng</language><publisher>United States: National Academy of Sciences</publisher><subject>Data analysis ; Decomposition ; Matrix ; Physical Sciences ; Principal components analysis ; Randomized algorithms</subject><ispartof>Proceedings of the National Academy of Sciences - PNAS, 2009-01, Vol.106 (3), p.697-702</ispartof><rights>Copyright National Academy of Sciences Jan 20, 2009</rights><rights>2009 by The National Academy of Sciences of the USA</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c561t-8947c9821103ba84b8ed5e353222861803361f2416613bb60c8252d03b00df233</citedby><cites>FETCH-LOGICAL-c561t-8947c9821103ba84b8ed5e353222861803361f2416613bb60c8252d03b00df233</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Uhttp://www.pnas.org/content/106/3.cover.gif</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC2630100/pdf/$$EPDF$$P50$$Gpubmedcentral$$H</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC2630100/$$EHTML$$P50$$Gpubmedcentral$$H</linktohtml><link.rule.ids>230,314,727,780,784,885,27924,27925,53791,53793</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/19139392$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Mahoney, Michael W</creatorcontrib><creatorcontrib>Drineas, Petros</creatorcontrib><title>CUR matrix decompositions for improved data analysis</title><title>Proceedings of the National Academy of Sciences - PNAS</title><addtitle>Proc Natl Acad Sci U S A</addtitle><description>Principal components analysis and, more generally, the Singular Value Decomposition are fundamental data analysis tools that express a data matrix in terms of a sequence of orthogonal or uncorrelated vectors of decreasing importance. Unfortunately, being linear combinations of up to all the data points, these vectors are notoriously difficult to interpret in terms of the data and processes generating the data. In this article, we develop CUR matrix decompositions for improved data analysis. CUR decompositions are low-rank matrix decompositions that are explicitly expressed in terms of a small number of actual columns and/or actual rows of the data matrix. Because they are constructed from actual data elements, CUR decompositions are interpretable by practitioners of the field from which the data are drawn (to the extent that the original data are). We present an algorithm that preferentially chooses columns and rows that exhibit high "statistical leverage" and, thus, in a very precise statistical sense, exert a disproportionately large "influence" on the best low-rank fit of the data matrix. By selecting columns and rows in this manner, we obtain improved relative-error and constant-factor approximation guarantees in worst-case analysis, as opposed to the much coarser additive-error guarantees of prior work. In addition, since the construction involves computing quantities with a natural and widely studied statistical interpretation, we can leverage ideas from diagnostic regression analysis to employ these matrix decompositions for exploratory data analysis.</description><subject>Data analysis</subject><subject>Decomposition</subject><subject>Matrix</subject><subject>Physical Sciences</subject><subject>Principal components analysis</subject><subject>Randomized algorithms</subject><issn>0027-8424</issn><issn>1091-6490</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2009</creationdate><recordtype>article</recordtype><recordid>eNptkc1L7DAUxYM80fFj7UpfceGuem_SpslGkMEvEAR11iFtU420TV_SDvrf23EGRx-u7uL-7uGcewg5QDhFyNhZ1-pwCgIYhRSBb5AJgsSYJxL-kAkAzWKR0GSb7ITwCgAyFbBFtlEik0zSCUmms4eo0b23b1FpCtd0LtjeujZElfORbTrv5qaMSt3rSLe6fg827JHNStfB7K_mLpldXT5Nb-K7--vb6cVdXKQc-1jIJCukoIjAci2SXJgyNSxllFLBcXTNOFY0Qc6R5TmHQtCUliMMUFaUsV1yvtTthrwxZWHa3utadd422r8rp636uWnti3p2c0U5AwQYBU5WAt79G0zoVWNDYepat8YNQXEuGM8kH8Hj_8BXN_gxblAUkGUgPqGzJVR4F4I31ZcTBLWoQy3qUOs6xouj7wHW_Or_I3C4AhaXazmumOIy-xbg172qhrruzVs_gn-XYKWd0s_eBjV7XFgHTEWKnLIPs1WkwA</recordid><startdate>20090120</startdate><enddate>20090120</enddate><creator>Mahoney, Michael W</creator><creator>Drineas, Petros</creator><general>National Academy of Sciences</general><general>National Acad Sciences</general><scope>FBQ</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7QG</scope><scope>7QL</scope><scope>7QP</scope><scope>7QR</scope><scope>7SN</scope><scope>7SS</scope><scope>7T5</scope><scope>7TK</scope><scope>7TM</scope><scope>7TO</scope><scope>7U9</scope><scope>8FD</scope><scope>C1K</scope><scope>FR3</scope><scope>H94</scope><scope>M7N</scope><scope>P64</scope><scope>RC3</scope><scope>7X8</scope><scope>5PM</scope></search><sort><creationdate>20090120</creationdate><title>CUR matrix decompositions for improved data analysis</title><author>Mahoney, Michael W ; Drineas, Petros</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c561t-8947c9821103ba84b8ed5e353222861803361f2416613bb60c8252d03b00df233</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2009</creationdate><topic>Data analysis</topic><topic>Decomposition</topic><topic>Matrix</topic><topic>Physical Sciences</topic><topic>Principal components analysis</topic><topic>Randomized algorithms</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Mahoney, Michael W</creatorcontrib><creatorcontrib>Drineas, Petros</creatorcontrib><collection>AGRIS</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Animal Behavior Abstracts</collection><collection>Bacteriology Abstracts (Microbiology B)</collection><collection>Calcium &amp; Calcified Tissue Abstracts</collection><collection>Chemoreception Abstracts</collection><collection>Ecology Abstracts</collection><collection>Entomology Abstracts (Full archive)</collection><collection>Immunology Abstracts</collection><collection>Neurosciences Abstracts</collection><collection>Nucleic Acids Abstracts</collection><collection>Oncogenes and Growth Factors Abstracts</collection><collection>Virology and AIDS Abstracts</collection><collection>Technology Research Database</collection><collection>Environmental Sciences and Pollution Management</collection><collection>Engineering Research Database</collection><collection>AIDS and Cancer Research Abstracts</collection><collection>Algology Mycology and Protozoology Abstracts (Microbiology C)</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Proceedings of the National Academy of Sciences - PNAS</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Mahoney, Michael W</au><au>Drineas, Petros</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>CUR matrix decompositions for improved data analysis</atitle><jtitle>Proceedings of the National Academy of Sciences - PNAS</jtitle><addtitle>Proc Natl Acad Sci U S A</addtitle><date>2009-01-20</date><risdate>2009</risdate><volume>106</volume><issue>3</issue><spage>697</spage><epage>702</epage><pages>697-702</pages><issn>0027-8424</issn><eissn>1091-6490</eissn><abstract>Principal components analysis and, more generally, the Singular Value Decomposition are fundamental data analysis tools that express a data matrix in terms of a sequence of orthogonal or uncorrelated vectors of decreasing importance. Unfortunately, being linear combinations of up to all the data points, these vectors are notoriously difficult to interpret in terms of the data and processes generating the data. In this article, we develop CUR matrix decompositions for improved data analysis. CUR decompositions are low-rank matrix decompositions that are explicitly expressed in terms of a small number of actual columns and/or actual rows of the data matrix. Because they are constructed from actual data elements, CUR decompositions are interpretable by practitioners of the field from which the data are drawn (to the extent that the original data are). We present an algorithm that preferentially chooses columns and rows that exhibit high "statistical leverage" and, thus, in a very precise statistical sense, exert a disproportionately large "influence" on the best low-rank fit of the data matrix. By selecting columns and rows in this manner, we obtain improved relative-error and constant-factor approximation guarantees in worst-case analysis, as opposed to the much coarser additive-error guarantees of prior work. In addition, since the construction involves computing quantities with a natural and widely studied statistical interpretation, we can leverage ideas from diagnostic regression analysis to employ these matrix decompositions for exploratory data analysis.</abstract><cop>United States</cop><pub>National Academy of Sciences</pub><pmid>19139392</pmid><doi>10.1073/pnas.0803205106</doi><tpages>6</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 0027-8424
ispartof Proceedings of the National Academy of Sciences - PNAS, 2009-01, Vol.106 (3), p.697-702
issn 0027-8424
1091-6490
language eng
recordid cdi_fao_agris_US201301585162
source JSTOR Archival Journals and Primary Sources Collection; PubMed Central
subjects Data analysis
Decomposition
Matrix
Physical Sciences
Principal components analysis
Randomized algorithms
title CUR matrix decompositions for improved data analysis
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-01T09%3A03%3A57IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_fao_a&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=CUR%20matrix%20decompositions%20for%20improved%20data%20analysis&rft.jtitle=Proceedings%20of%20the%20National%20Academy%20of%20Sciences%20-%20PNAS&rft.au=Mahoney,%20Michael%20W&rft.date=2009-01-20&rft.volume=106&rft.issue=3&rft.spage=697&rft.epage=702&rft.pages=697-702&rft.issn=0027-8424&rft.eissn=1091-6490&rft_id=info:doi/10.1073/pnas.0803205106&rft_dat=%3Cproquest_fao_a%3E66836796%3C/proquest_fao_a%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c561t-8947c9821103ba84b8ed5e353222861803361f2416613bb60c8252d03b00df233%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=201370896&rft_id=info:pmid/19139392&rfr_iscdi=true