Loading…

Regularized Greedy Column Subset Selection

The Column Subset Selection Problem provides a natural framework for unsupervised feature selection. Despite being a hard combinatorial optimization problem, there exist efficient algorithms that provide good approximations. The drawback of the problem formulation is that it incorporates no form of...

Full description

Saved in:

Bibliographic Details
Published in:	arXiv.org 2018-04
Main Authors:	Ordozgoiti, Bruno, Mozo, Alberto, Jesús García López de Lacalle
Format:	Article
Language:	English
Subjects:	Adequacy Combinatorial analysis Formulations Greedy algorithms Lower bounds Noise sensitivity Optimization Regularization Robustness (mathematics)
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by
cites
container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Ordozgoiti, Bruno Mozo, Alberto Jesús García López de Lacalle
description	The Column Subset Selection Problem provides a natural framework for unsupervised feature selection. Despite being a hard combinatorial optimization problem, there exist efficient algorithms that provide good approximations. The drawback of the problem formulation is that it incorporates no form of regularization, and is therefore very sensitive to noise when presented with scarce data. In this paper we propose a regularized formulation of this problem, and derive a correct greedy algorithm that is similar in efficiency to existing greedy methods for the unregularized problem. We study its adequacy for feature selection and propose suitable formulations. Additionally, we derive a lower bound for the error of the proposed problems. Through various numerical experiments on real and synthetic data, we demonstrate the significantly increased robustness and stability of our method, as well as the improved conditioning of its output, all while remaining efficient for practical use.
format	article
fullrecord	<record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2071995684</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2071995684</sourcerecordid><originalsourceid>FETCH-proquest_journals_20719956843</originalsourceid><addsrcrecordid>eNpjYuA0MjY21LUwMTLiYOAtLs4yMDAwMjM3MjU15mTQCkpNL81JLMqsSk1RcC9KTU2pVHDOzynNzVMILk0qTi1RCE7NSU0uyczP42FgTUvMKU7lhdLcDMpuriHOHroFRfmFpanFJfFZ-aVFeUCpeCMDc0NLS1MzCxNj4lQBAB0TMP0</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2071995684</pqid></control><display><type>article</type><title>Regularized Greedy Column Subset Selection</title><source>Publicly Available Content Database</source><creator>Ordozgoiti, Bruno ; Mozo, Alberto ; Jesús García López de Lacalle</creator><creatorcontrib>Ordozgoiti, Bruno ; Mozo, Alberto ; Jesús García López de Lacalle</creatorcontrib><description>The Column Subset Selection Problem provides a natural framework for unsupervised feature selection. Despite being a hard combinatorial optimization problem, there exist efficient algorithms that provide good approximations. The drawback of the problem formulation is that it incorporates no form of regularization, and is therefore very sensitive to noise when presented with scarce data. In this paper we propose a regularized formulation of this problem, and derive a correct greedy algorithm that is similar in efficiency to existing greedy methods for the unregularized problem. We study its adequacy for feature selection and propose suitable formulations. Additionally, we derive a lower bound for the error of the proposed problems. Through various numerical experiments on real and synthetic data, we demonstrate the significantly increased robustness and stability of our method, as well as the improved conditioning of its output, all while remaining efficient for practical use.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Adequacy ; Combinatorial analysis ; Formulations ; Greedy algorithms ; Lower bounds ; Noise sensitivity ; Optimization ; Regularization ; Robustness (mathematics)</subject><ispartof>arXiv.org, 2018-04</ispartof><rights>2018. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/2071995684?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>780,784,25753,37012,44590</link.rule.ids></links><search><creatorcontrib>Ordozgoiti, Bruno</creatorcontrib><creatorcontrib>Mozo, Alberto</creatorcontrib><creatorcontrib>Jesús García López de Lacalle</creatorcontrib><title>Regularized Greedy Column Subset Selection</title><title>arXiv.org</title><description>The Column Subset Selection Problem provides a natural framework for unsupervised feature selection. Despite being a hard combinatorial optimization problem, there exist efficient algorithms that provide good approximations. The drawback of the problem formulation is that it incorporates no form of regularization, and is therefore very sensitive to noise when presented with scarce data. In this paper we propose a regularized formulation of this problem, and derive a correct greedy algorithm that is similar in efficiency to existing greedy methods for the unregularized problem. We study its adequacy for feature selection and propose suitable formulations. Additionally, we derive a lower bound for the error of the proposed problems. Through various numerical experiments on real and synthetic data, we demonstrate the significantly increased robustness and stability of our method, as well as the improved conditioning of its output, all while remaining efficient for practical use.</description><subject>Adequacy</subject><subject>Combinatorial analysis</subject><subject>Formulations</subject><subject>Greedy algorithms</subject><subject>Lower bounds</subject><subject>Noise sensitivity</subject><subject>Optimization</subject><subject>Regularization</subject><subject>Robustness (mathematics)</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><recordid>eNpjYuA0MjY21LUwMTLiYOAtLs4yMDAwMjM3MjU15mTQCkpNL81JLMqsSk1RcC9KTU2pVHDOzynNzVMILk0qTi1RCE7NSU0uyczP42FgTUvMKU7lhdLcDMpuriHOHroFRfmFpanFJfFZ-aVFeUCpeCMDc0NLS1MzCxNj4lQBAB0TMP0</recordid><startdate>20180412</startdate><enddate>20180412</enddate><creator>Ordozgoiti, Bruno</creator><creator>Mozo, Alberto</creator><creator>Jesús García López de Lacalle</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20180412</creationdate><title>Regularized Greedy Column Subset Selection</title><author>Ordozgoiti, Bruno ; Mozo, Alberto ; Jesús García López de Lacalle</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_20719956843</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2018</creationdate><topic>Adequacy</topic><topic>Combinatorial analysis</topic><topic>Formulations</topic><topic>Greedy algorithms</topic><topic>Lower bounds</topic><topic>Noise sensitivity</topic><topic>Optimization</topic><topic>Regularization</topic><topic>Robustness (mathematics)</topic><toplevel>online_resources</toplevel><creatorcontrib>Ordozgoiti, Bruno</creatorcontrib><creatorcontrib>Mozo, Alberto</creatorcontrib><creatorcontrib>Jesús García López de Lacalle</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Ordozgoiti, Bruno</au><au>Mozo, Alberto</au><au>Jesús García López de Lacalle</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Regularized Greedy Column Subset Selection</atitle><jtitle>arXiv.org</jtitle><date>2018-04-12</date><risdate>2018</risdate><eissn>2331-8422</eissn><abstract>The Column Subset Selection Problem provides a natural framework for unsupervised feature selection. Despite being a hard combinatorial optimization problem, there exist efficient algorithms that provide good approximations. The drawback of the problem formulation is that it incorporates no form of regularization, and is therefore very sensitive to noise when presented with scarce data. In this paper we propose a regularized formulation of this problem, and derive a correct greedy algorithm that is similar in efficiency to existing greedy methods for the unregularized problem. We study its adequacy for feature selection and propose suitable formulations. Additionally, we derive a lower bound for the error of the proposed problems. Through various numerical experiments on real and synthetic data, we demonstrate the significantly increased robustness and stability of our method, as well as the improved conditioning of its output, all while remaining efficient for practical use.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2018-04
issn	2331-8422
language	eng
recordid	cdi_proquest_journals_2071995684
source	Publicly Available Content Database
subjects	Adequacy Combinatorial analysis Formulations Greedy algorithms Lower bounds Noise sensitivity Optimization Regularization Robustness (mathematics)
title	Regularized Greedy Column Subset Selection
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-28T22%3A15%3A34IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Regularized%20Greedy%20Column%20Subset%20Selection&rft.jtitle=arXiv.org&rft.au=Ordozgoiti,%20Bruno&rft.date=2018-04-12&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2071995684%3C/proquest%3E%3Cgrp_id%3Ecdi_FETCH-proquest_journals_20719956843%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2071995684&rft_id=info:pmid/&rfr_iscdi=true