Loading…
Regularized Greedy Column Subset Selection
The Column Subset Selection Problem provides a natural framework for unsupervised feature selection. Despite being a hard combinatorial optimization problem, there exist efficient algorithms that provide good approximations. The drawback of the problem formulation is that it incorporates no form of...
Saved in:
Published in: | arXiv.org 2018-04 |
---|---|
Main Authors: | , , |
Format: | Article |
Language: | English |
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | |
---|---|
cites | |
container_end_page | |
container_issue | |
container_start_page | |
container_title | arXiv.org |
container_volume | |
creator | Ordozgoiti, Bruno Mozo, Alberto Jesús García López de Lacalle |
description | The Column Subset Selection Problem provides a natural framework for unsupervised feature selection. Despite being a hard combinatorial optimization problem, there exist efficient algorithms that provide good approximations. The drawback of the problem formulation is that it incorporates no form of regularization, and is therefore very sensitive to noise when presented with scarce data. In this paper we propose a regularized formulation of this problem, and derive a correct greedy algorithm that is similar in efficiency to existing greedy methods for the unregularized problem. We study its adequacy for feature selection and propose suitable formulations. Additionally, we derive a lower bound for the error of the proposed problems. Through various numerical experiments on real and synthetic data, we demonstrate the significantly increased robustness and stability of our method, as well as the improved conditioning of its output, all while remaining efficient for practical use. |
format | article |
fullrecord | <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2071995684</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2071995684</sourcerecordid><originalsourceid>FETCH-proquest_journals_20719956843</originalsourceid><addsrcrecordid>eNpjYuA0MjY21LUwMTLiYOAtLs4yMDAwMjM3MjU15mTQCkpNL81JLMqsSk1RcC9KTU2pVHDOzynNzVMILk0qTi1RCE7NSU0uyczP42FgTUvMKU7lhdLcDMpuriHOHroFRfmFpanFJfFZ-aVFeUCpeCMDc0NLS1MzCxNj4lQBAB0TMP0</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2071995684</pqid></control><display><type>article</type><title>Regularized Greedy Column Subset Selection</title><source>Publicly Available Content Database</source><creator>Ordozgoiti, Bruno ; Mozo, Alberto ; Jesús García López de Lacalle</creator><creatorcontrib>Ordozgoiti, Bruno ; Mozo, Alberto ; Jesús García López de Lacalle</creatorcontrib><description>The Column Subset Selection Problem provides a natural framework for unsupervised feature selection. Despite being a hard combinatorial optimization problem, there exist efficient algorithms that provide good approximations. The drawback of the problem formulation is that it incorporates no form of regularization, and is therefore very sensitive to noise when presented with scarce data. In this paper we propose a regularized formulation of this problem, and derive a correct greedy algorithm that is similar in efficiency to existing greedy methods for the unregularized problem. We study its adequacy for feature selection and propose suitable formulations. Additionally, we derive a lower bound for the error of the proposed problems. Through various numerical experiments on real and synthetic data, we demonstrate the significantly increased robustness and stability of our method, as well as the improved conditioning of its output, all while remaining efficient for practical use.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Adequacy ; Combinatorial analysis ; Formulations ; Greedy algorithms ; Lower bounds ; Noise sensitivity ; Optimization ; Regularization ; Robustness (mathematics)</subject><ispartof>arXiv.org, 2018-04</ispartof><rights>2018. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/2071995684?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>780,784,25753,37012,44590</link.rule.ids></links><search><creatorcontrib>Ordozgoiti, Bruno</creatorcontrib><creatorcontrib>Mozo, Alberto</creatorcontrib><creatorcontrib>Jesús García López de Lacalle</creatorcontrib><title>Regularized Greedy Column Subset Selection</title><title>arXiv.org</title><description>The Column Subset Selection Problem provides a natural framework for unsupervised feature selection. Despite being a hard combinatorial optimization problem, there exist efficient algorithms that provide good approximations. The drawback of the problem formulation is that it incorporates no form of regularization, and is therefore very sensitive to noise when presented with scarce data. In this paper we propose a regularized formulation of this problem, and derive a correct greedy algorithm that is similar in efficiency to existing greedy methods for the unregularized problem. We study its adequacy for feature selection and propose suitable formulations. Additionally, we derive a lower bound for the error of the proposed problems. Through various numerical experiments on real and synthetic data, we demonstrate the significantly increased robustness and stability of our method, as well as the improved conditioning of its output, all while remaining efficient for practical use.</description><subject>Adequacy</subject><subject>Combinatorial analysis</subject><subject>Formulations</subject><subject>Greedy algorithms</subject><subject>Lower bounds</subject><subject>Noise sensitivity</subject><subject>Optimization</subject><subject>Regularization</subject><subject>Robustness (mathematics)</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><recordid>eNpjYuA0MjY21LUwMTLiYOAtLs4yMDAwMjM3MjU15mTQCkpNL81JLMqsSk1RcC9KTU2pVHDOzynNzVMILk0qTi1RCE7NSU0uyczP42FgTUvMKU7lhdLcDMpuriHOHroFRfmFpanFJfFZ-aVFeUCpeCMDc0NLS1MzCxNj4lQBAB0TMP0</recordid><startdate>20180412</startdate><enddate>20180412</enddate><creator>Ordozgoiti, Bruno</creator><creator>Mozo, Alberto</creator><creator>Jesús García López de Lacalle</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20180412</creationdate><title>Regularized Greedy Column Subset Selection</title><author>Ordozgoiti, Bruno ; Mozo, Alberto ; Jesús García López de Lacalle</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_20719956843</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2018</creationdate><topic>Adequacy</topic><topic>Combinatorial analysis</topic><topic>Formulations</topic><topic>Greedy algorithms</topic><topic>Lower bounds</topic><topic>Noise sensitivity</topic><topic>Optimization</topic><topic>Regularization</topic><topic>Robustness (mathematics)</topic><toplevel>online_resources</toplevel><creatorcontrib>Ordozgoiti, Bruno</creatorcontrib><creatorcontrib>Mozo, Alberto</creatorcontrib><creatorcontrib>Jesús García López de Lacalle</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Ordozgoiti, Bruno</au><au>Mozo, Alberto</au><au>Jesús García López de Lacalle</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Regularized Greedy Column Subset Selection</atitle><jtitle>arXiv.org</jtitle><date>2018-04-12</date><risdate>2018</risdate><eissn>2331-8422</eissn><abstract>The Column Subset Selection Problem provides a natural framework for unsupervised feature selection. Despite being a hard combinatorial optimization problem, there exist efficient algorithms that provide good approximations. The drawback of the problem formulation is that it incorporates no form of regularization, and is therefore very sensitive to noise when presented with scarce data. In this paper we propose a regularized formulation of this problem, and derive a correct greedy algorithm that is similar in efficiency to existing greedy methods for the unregularized problem. We study its adequacy for feature selection and propose suitable formulations. Additionally, we derive a lower bound for the error of the proposed problems. Through various numerical experiments on real and synthetic data, we demonstrate the significantly increased robustness and stability of our method, as well as the improved conditioning of its output, all while remaining efficient for practical use.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | EISSN: 2331-8422 |
ispartof | arXiv.org, 2018-04 |
issn | 2331-8422 |
language | eng |
recordid | cdi_proquest_journals_2071995684 |
source | Publicly Available Content Database |
subjects | Adequacy Combinatorial analysis Formulations Greedy algorithms Lower bounds Noise sensitivity Optimization Regularization Robustness (mathematics) |
title | Regularized Greedy Column Subset Selection |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-28T22%3A15%3A34IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Regularized%20Greedy%20Column%20Subset%20Selection&rft.jtitle=arXiv.org&rft.au=Ordozgoiti,%20Bruno&rft.date=2018-04-12&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2071995684%3C/proquest%3E%3Cgrp_id%3Ecdi_FETCH-proquest_journals_20719956843%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2071995684&rft_id=info:pmid/&rfr_iscdi=true |