Loading…

Regularized Greedy Column Subset Selection

The Column Subset Selection Problem provides a natural framework for unsupervised feature selection. Despite being a hard combinatorial optimization problem, there exist efficient algorithms that provide good approximations. The drawback of the problem formulation is that it incorporates no form of...

Full description

Saved in:
Bibliographic Details
Published in:arXiv.org 2018-04
Main Authors: Ordozgoiti, Bruno, Mozo, Alberto, Jesús García López de Lacalle
Format: Article
Language:English
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites
container_end_page
container_issue
container_start_page
container_title arXiv.org
container_volume
creator Ordozgoiti, Bruno
Mozo, Alberto
Jesús García López de Lacalle
description The Column Subset Selection Problem provides a natural framework for unsupervised feature selection. Despite being a hard combinatorial optimization problem, there exist efficient algorithms that provide good approximations. The drawback of the problem formulation is that it incorporates no form of regularization, and is therefore very sensitive to noise when presented with scarce data. In this paper we propose a regularized formulation of this problem, and derive a correct greedy algorithm that is similar in efficiency to existing greedy methods for the unregularized problem. We study its adequacy for feature selection and propose suitable formulations. Additionally, we derive a lower bound for the error of the proposed problems. Through various numerical experiments on real and synthetic data, we demonstrate the significantly increased robustness and stability of our method, as well as the improved conditioning of its output, all while remaining efficient for practical use.
format article
fullrecord <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2071995684</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2071995684</sourcerecordid><originalsourceid>FETCH-proquest_journals_20719956843</originalsourceid><addsrcrecordid>eNpjYuA0MjY21LUwMTLiYOAtLs4yMDAwMjM3MjU15mTQCkpNL81JLMqsSk1RcC9KTU2pVHDOzynNzVMILk0qTi1RCE7NSU0uyczP42FgTUvMKU7lhdLcDMpuriHOHroFRfmFpanFJfFZ-aVFeUCpeCMDc0NLS1MzCxNj4lQBAB0TMP0</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2071995684</pqid></control><display><type>article</type><title>Regularized Greedy Column Subset Selection</title><source>Publicly Available Content Database</source><creator>Ordozgoiti, Bruno ; Mozo, Alberto ; Jesús García López de Lacalle</creator><creatorcontrib>Ordozgoiti, Bruno ; Mozo, Alberto ; Jesús García López de Lacalle</creatorcontrib><description>The Column Subset Selection Problem provides a natural framework for unsupervised feature selection. Despite being a hard combinatorial optimization problem, there exist efficient algorithms that provide good approximations. The drawback of the problem formulation is that it incorporates no form of regularization, and is therefore very sensitive to noise when presented with scarce data. In this paper we propose a regularized formulation of this problem, and derive a correct greedy algorithm that is similar in efficiency to existing greedy methods for the unregularized problem. We study its adequacy for feature selection and propose suitable formulations. Additionally, we derive a lower bound for the error of the proposed problems. Through various numerical experiments on real and synthetic data, we demonstrate the significantly increased robustness and stability of our method, as well as the improved conditioning of its output, all while remaining efficient for practical use.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Adequacy ; Combinatorial analysis ; Formulations ; Greedy algorithms ; Lower bounds ; Noise sensitivity ; Optimization ; Regularization ; Robustness (mathematics)</subject><ispartof>arXiv.org, 2018-04</ispartof><rights>2018. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/2071995684?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>780,784,25753,37012,44590</link.rule.ids></links><search><creatorcontrib>Ordozgoiti, Bruno</creatorcontrib><creatorcontrib>Mozo, Alberto</creatorcontrib><creatorcontrib>Jesús García López de Lacalle</creatorcontrib><title>Regularized Greedy Column Subset Selection</title><title>arXiv.org</title><description>The Column Subset Selection Problem provides a natural framework for unsupervised feature selection. Despite being a hard combinatorial optimization problem, there exist efficient algorithms that provide good approximations. The drawback of the problem formulation is that it incorporates no form of regularization, and is therefore very sensitive to noise when presented with scarce data. In this paper we propose a regularized formulation of this problem, and derive a correct greedy algorithm that is similar in efficiency to existing greedy methods for the unregularized problem. We study its adequacy for feature selection and propose suitable formulations. Additionally, we derive a lower bound for the error of the proposed problems. Through various numerical experiments on real and synthetic data, we demonstrate the significantly increased robustness and stability of our method, as well as the improved conditioning of its output, all while remaining efficient for practical use.</description><subject>Adequacy</subject><subject>Combinatorial analysis</subject><subject>Formulations</subject><subject>Greedy algorithms</subject><subject>Lower bounds</subject><subject>Noise sensitivity</subject><subject>Optimization</subject><subject>Regularization</subject><subject>Robustness (mathematics)</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><recordid>eNpjYuA0MjY21LUwMTLiYOAtLs4yMDAwMjM3MjU15mTQCkpNL81JLMqsSk1RcC9KTU2pVHDOzynNzVMILk0qTi1RCE7NSU0uyczP42FgTUvMKU7lhdLcDMpuriHOHroFRfmFpanFJfFZ-aVFeUCpeCMDc0NLS1MzCxNj4lQBAB0TMP0</recordid><startdate>20180412</startdate><enddate>20180412</enddate><creator>Ordozgoiti, Bruno</creator><creator>Mozo, Alberto</creator><creator>Jesús García López de Lacalle</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20180412</creationdate><title>Regularized Greedy Column Subset Selection</title><author>Ordozgoiti, Bruno ; Mozo, Alberto ; Jesús García López de Lacalle</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_20719956843</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2018</creationdate><topic>Adequacy</topic><topic>Combinatorial analysis</topic><topic>Formulations</topic><topic>Greedy algorithms</topic><topic>Lower bounds</topic><topic>Noise sensitivity</topic><topic>Optimization</topic><topic>Regularization</topic><topic>Robustness (mathematics)</topic><toplevel>online_resources</toplevel><creatorcontrib>Ordozgoiti, Bruno</creatorcontrib><creatorcontrib>Mozo, Alberto</creatorcontrib><creatorcontrib>Jesús García López de Lacalle</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Ordozgoiti, Bruno</au><au>Mozo, Alberto</au><au>Jesús García López de Lacalle</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Regularized Greedy Column Subset Selection</atitle><jtitle>arXiv.org</jtitle><date>2018-04-12</date><risdate>2018</risdate><eissn>2331-8422</eissn><abstract>The Column Subset Selection Problem provides a natural framework for unsupervised feature selection. Despite being a hard combinatorial optimization problem, there exist efficient algorithms that provide good approximations. The drawback of the problem formulation is that it incorporates no form of regularization, and is therefore very sensitive to noise when presented with scarce data. In this paper we propose a regularized formulation of this problem, and derive a correct greedy algorithm that is similar in efficiency to existing greedy methods for the unregularized problem. We study its adequacy for feature selection and propose suitable formulations. Additionally, we derive a lower bound for the error of the proposed problems. Through various numerical experiments on real and synthetic data, we demonstrate the significantly increased robustness and stability of our method, as well as the improved conditioning of its output, all while remaining efficient for practical use.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier EISSN: 2331-8422
ispartof arXiv.org, 2018-04
issn 2331-8422
language eng
recordid cdi_proquest_journals_2071995684
source Publicly Available Content Database
subjects Adequacy
Combinatorial analysis
Formulations
Greedy algorithms
Lower bounds
Noise sensitivity
Optimization
Regularization
Robustness (mathematics)
title Regularized Greedy Column Subset Selection
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-28T22%3A15%3A34IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Regularized%20Greedy%20Column%20Subset%20Selection&rft.jtitle=arXiv.org&rft.au=Ordozgoiti,%20Bruno&rft.date=2018-04-12&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2071995684%3C/proquest%3E%3Cgrp_id%3Ecdi_FETCH-proquest_journals_20719956843%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2071995684&rft_id=info:pmid/&rfr_iscdi=true