Loading…

Active metric learning for supervised classification

Clustering and classification critically rely on distance metrics that provide meaningful comparisons between data points. To this end, learning optimal distance functions from data, known as metric learning, aims to facilitate supervised classification, particularly in high-dimensional spaces where...

Full description

Saved in:

Bibliographic Details
Published in:	Computers & chemical engineering 2021-01, Vol.144, p.107132, Article 107132
Main Authors:	Kumaran, Krishnan, Papageorgiou, Dimitri J, Takac, Martin, Lueg, Laurens, Sahinidis, Nicolas V
Format:	Article
Language:	English
Subjects:	Active learning Clustering Metric learning Mixed-integer optimization
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by	cdi_FETCH-LOGICAL-c372t-52777b807ddc526b845368d6347a86d41fb2f671a6170acc3eb6535be107ced3
cites	cdi_FETCH-LOGICAL-c372t-52777b807ddc526b845368d6347a86d41fb2f671a6170acc3eb6535be107ced3
container_end_page
container_issue
container_start_page	107132
container_title	Computers & chemical engineering
container_volume	144
creator	Kumaran, Krishnan Papageorgiou, Dimitri J Takac, Martin Lueg, Laurens Sahinidis, Nicolas V
description	Clustering and classification critically rely on distance metrics that provide meaningful comparisons between data points. To this end, learning optimal distance functions from data, known as metric learning, aims to facilitate supervised classification, particularly in high-dimensional spaces where visualization is challenging or infeasible. In particular, the Mahalanobis metric is the default choice due to simplicity and interpretability as a transformation of the simple Euclidean metric using a combination of rotation and scaling. In this work, we present several novel contributions to metric learning, both by way of formulation as well as solution methods. Our approach is motivated by agglomerative clustering with certain novel modifications that enable natural interpretation of the user-defined classes as clusters with the optimal metric. Our approach generalizes and improves upon leading methods by removing reliance on pre-designated “target neighbors,” “triplets,” and “similarity pairs.” Starting with the definition of a generalized metric that has the Mahalanobis metric as the second order term, we propose an objective function for metric selection that does not aim to isolate classes from each other like most previous work, but tries to distort the space minimally by aggregating co-class members into local clusters. Further, we formulate the problem as a mixed-integer optimization that can be solved efficiently for small/medium datasets and approximated for larger datasets. Another salient feature of our method is that it facilitates active learning by recommending precise regions to sample using the optimal metric to improve classification performance. These regions are indicated by boundary and outlier points of the dataset as defined by the metric. This targeted acquisition can significantly reduce computation and data acquisition by ensuring training data completeness, representativeness, and economy, which could also provide advantages in training data selection for other established methods like Deep Learning and Random Forests. We demonstrate classification and computational performance of our approach through several simple and intuitive examples, followed by results on real image and benchmark datasets.
doi_str_mv	10.1016/j.compchemeng.2020.107132
format	article
fullrecord	<record><control><sourceid>elsevier_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1016_j_compchemeng_2020_107132</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0098135419313225</els_id><sourcerecordid>S0098135419313225</sourcerecordid><originalsourceid>FETCH-LOGICAL-c372t-52777b807ddc526b845368d6347a86d41fb2f671a6170acc3eb6535be107ced3</originalsourceid><addsrcrecordid>eNqNj8tqwzAUREVpoWnaf3A_wK7eUpYh9AWBbrIX8tV1KhPbQXIN_fs6uIsuuxoYmMMcQh4ZrRhl-qmtYOjO8Ikd9seKU37pDRP8iqyYNaKUwqhrsqJ0Y0smlLwldzm3lFIurV0RuYUxTlh0OKYIxQl96mN_LJohFfnrjGmKGUMBJ59zbCL4MQ79Pblp_Cnjw2-uyeHl-bB7K_cfr--77b4EYfhYKm6MqS01IYDiurZSCW2DFtJ4q4NkTc0bbZjXzFAPILDWSqgaZwPAINZks2AhDTknbNw5xc6nb8eou9i71v2xdxd7t9jP292yxfnfFDG5DBH7GRsTwujCEP9B-QGGkWlD</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Active metric learning for supervised classification</title><source>Elsevier:Jisc Collections:Elsevier Read and Publish Agreement 2022-2024:Freedom Collection (Reading list)</source><creator>Kumaran, Krishnan ; Papageorgiou, Dimitri J ; Takac, Martin ; Lueg, Laurens ; Sahinidis, Nicolas V</creator><creatorcontrib>Kumaran, Krishnan ; Papageorgiou, Dimitri J ; Takac, Martin ; Lueg, Laurens ; Sahinidis, Nicolas V</creatorcontrib><description>Clustering and classification critically rely on distance metrics that provide meaningful comparisons between data points. To this end, learning optimal distance functions from data, known as metric learning, aims to facilitate supervised classification, particularly in high-dimensional spaces where visualization is challenging or infeasible. In particular, the Mahalanobis metric is the default choice due to simplicity and interpretability as a transformation of the simple Euclidean metric using a combination of rotation and scaling. In this work, we present several novel contributions to metric learning, both by way of formulation as well as solution methods. Our approach is motivated by agglomerative clustering with certain novel modifications that enable natural interpretation of the user-defined classes as clusters with the optimal metric. Our approach generalizes and improves upon leading methods by removing reliance on pre-designated “target neighbors,” “triplets,” and “similarity pairs.” Starting with the definition of a generalized metric that has the Mahalanobis metric as the second order term, we propose an objective function for metric selection that does not aim to isolate classes from each other like most previous work, but tries to distort the space minimally by aggregating co-class members into local clusters. Further, we formulate the problem as a mixed-integer optimization that can be solved efficiently for small/medium datasets and approximated for larger datasets. Another salient feature of our method is that it facilitates active learning by recommending precise regions to sample using the optimal metric to improve classification performance. These regions are indicated by boundary and outlier points of the dataset as defined by the metric. This targeted acquisition can significantly reduce computation and data acquisition by ensuring training data completeness, representativeness, and economy, which could also provide advantages in training data selection for other established methods like Deep Learning and Random Forests. We demonstrate classification and computational performance of our approach through several simple and intuitive examples, followed by results on real image and benchmark datasets.</description><identifier>ISSN: 0098-1354</identifier><identifier>EISSN: 1873-4375</identifier><identifier>DOI: 10.1016/j.compchemeng.2020.107132</identifier><language>eng</language><publisher>Elsevier Ltd</publisher><subject>Active learning ; Clustering ; Metric learning ; Mixed-integer optimization</subject><ispartof>Computers & chemical engineering, 2021-01, Vol.144, p.107132, Article 107132</ispartof><rights>2020</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c372t-52777b807ddc526b845368d6347a86d41fb2f671a6170acc3eb6535be107ced3</citedby><cites>FETCH-LOGICAL-c372t-52777b807ddc526b845368d6347a86d41fb2f671a6170acc3eb6535be107ced3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids></links><search><creatorcontrib>Kumaran, Krishnan</creatorcontrib><creatorcontrib>Papageorgiou, Dimitri J</creatorcontrib><creatorcontrib>Takac, Martin</creatorcontrib><creatorcontrib>Lueg, Laurens</creatorcontrib><creatorcontrib>Sahinidis, Nicolas V</creatorcontrib><title>Active metric learning for supervised classification</title><title>Computers & chemical engineering</title><description>Clustering and classification critically rely on distance metrics that provide meaningful comparisons between data points. To this end, learning optimal distance functions from data, known as metric learning, aims to facilitate supervised classification, particularly in high-dimensional spaces where visualization is challenging or infeasible. In particular, the Mahalanobis metric is the default choice due to simplicity and interpretability as a transformation of the simple Euclidean metric using a combination of rotation and scaling. In this work, we present several novel contributions to metric learning, both by way of formulation as well as solution methods. Our approach is motivated by agglomerative clustering with certain novel modifications that enable natural interpretation of the user-defined classes as clusters with the optimal metric. Our approach generalizes and improves upon leading methods by removing reliance on pre-designated “target neighbors,” “triplets,” and “similarity pairs.” Starting with the definition of a generalized metric that has the Mahalanobis metric as the second order term, we propose an objective function for metric selection that does not aim to isolate classes from each other like most previous work, but tries to distort the space minimally by aggregating co-class members into local clusters. Further, we formulate the problem as a mixed-integer optimization that can be solved efficiently for small/medium datasets and approximated for larger datasets. Another salient feature of our method is that it facilitates active learning by recommending precise regions to sample using the optimal metric to improve classification performance. These regions are indicated by boundary and outlier points of the dataset as defined by the metric. This targeted acquisition can significantly reduce computation and data acquisition by ensuring training data completeness, representativeness, and economy, which could also provide advantages in training data selection for other established methods like Deep Learning and Random Forests. We demonstrate classification and computational performance of our approach through several simple and intuitive examples, followed by results on real image and benchmark datasets.</description><subject>Active learning</subject><subject>Clustering</subject><subject>Metric learning</subject><subject>Mixed-integer optimization</subject><issn>0098-1354</issn><issn>1873-4375</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><recordid>eNqNj8tqwzAUREVpoWnaf3A_wK7eUpYh9AWBbrIX8tV1KhPbQXIN_fs6uIsuuxoYmMMcQh4ZrRhl-qmtYOjO8Ikd9seKU37pDRP8iqyYNaKUwqhrsqJ0Y0smlLwldzm3lFIurV0RuYUxTlh0OKYIxQl96mN_LJohFfnrjGmKGUMBJ59zbCL4MQ79Pblp_Cnjw2-uyeHl-bB7K_cfr--77b4EYfhYKm6MqS01IYDiurZSCW2DFtJ4q4NkTc0bbZjXzFAPILDWSqgaZwPAINZks2AhDTknbNw5xc6nb8eou9i71v2xdxd7t9jP292yxfnfFDG5DBH7GRsTwujCEP9B-QGGkWlD</recordid><startdate>20210104</startdate><enddate>20210104</enddate><creator>Kumaran, Krishnan</creator><creator>Papageorgiou, Dimitri J</creator><creator>Takac, Martin</creator><creator>Lueg, Laurens</creator><creator>Sahinidis, Nicolas V</creator><general>Elsevier Ltd</general><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>20210104</creationdate><title>Active metric learning for supervised classification</title><author>Kumaran, Krishnan ; Papageorgiou, Dimitri J ; Takac, Martin ; Lueg, Laurens ; Sahinidis, Nicolas V</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c372t-52777b807ddc526b845368d6347a86d41fb2f671a6170acc3eb6535be107ced3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Active learning</topic><topic>Clustering</topic><topic>Metric learning</topic><topic>Mixed-integer optimization</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Kumaran, Krishnan</creatorcontrib><creatorcontrib>Papageorgiou, Dimitri J</creatorcontrib><creatorcontrib>Takac, Martin</creatorcontrib><creatorcontrib>Lueg, Laurens</creatorcontrib><creatorcontrib>Sahinidis, Nicolas V</creatorcontrib><collection>CrossRef</collection><jtitle>Computers & chemical engineering</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Kumaran, Krishnan</au><au>Papageorgiou, Dimitri J</au><au>Takac, Martin</au><au>Lueg, Laurens</au><au>Sahinidis, Nicolas V</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Active metric learning for supervised classification</atitle><jtitle>Computers & chemical engineering</jtitle><date>2021-01-04</date><risdate>2021</risdate><volume>144</volume><spage>107132</spage><pages>107132-</pages><artnum>107132</artnum><issn>0098-1354</issn><eissn>1873-4375</eissn><abstract>Clustering and classification critically rely on distance metrics that provide meaningful comparisons between data points. To this end, learning optimal distance functions from data, known as metric learning, aims to facilitate supervised classification, particularly in high-dimensional spaces where visualization is challenging or infeasible. In particular, the Mahalanobis metric is the default choice due to simplicity and interpretability as a transformation of the simple Euclidean metric using a combination of rotation and scaling. In this work, we present several novel contributions to metric learning, both by way of formulation as well as solution methods. Our approach is motivated by agglomerative clustering with certain novel modifications that enable natural interpretation of the user-defined classes as clusters with the optimal metric. Our approach generalizes and improves upon leading methods by removing reliance on pre-designated “target neighbors,” “triplets,” and “similarity pairs.” Starting with the definition of a generalized metric that has the Mahalanobis metric as the second order term, we propose an objective function for metric selection that does not aim to isolate classes from each other like most previous work, but tries to distort the space minimally by aggregating co-class members into local clusters. Further, we formulate the problem as a mixed-integer optimization that can be solved efficiently for small/medium datasets and approximated for larger datasets. Another salient feature of our method is that it facilitates active learning by recommending precise regions to sample using the optimal metric to improve classification performance. These regions are indicated by boundary and outlier points of the dataset as defined by the metric. This targeted acquisition can significantly reduce computation and data acquisition by ensuring training data completeness, representativeness, and economy, which could also provide advantages in training data selection for other established methods like Deep Learning and Random Forests. We demonstrate classification and computational performance of our approach through several simple and intuitive examples, followed by results on real image and benchmark datasets.</abstract><pub>Elsevier Ltd</pub><doi>10.1016/j.compchemeng.2020.107132</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 0098-1354
ispartof	Computers & chemical engineering, 2021-01, Vol.144, p.107132, Article 107132
issn	0098-1354 1873-4375
language	eng
recordid	cdi_crossref_primary_10_1016_j_compchemeng_2020_107132
source	Elsevier:Jisc Collections:Elsevier Read and Publish Agreement 2022-2024:Freedom Collection (Reading list)
subjects	Active learning Clustering Metric learning Mixed-integer optimization
title	Active metric learning for supervised classification
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-03T09%3A02%3A24IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-elsevier_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Active%20metric%20learning%20for%20supervised%20classification&rft.jtitle=Computers%20&%20chemical%20engineering&rft.au=Kumaran,%20Krishnan&rft.date=2021-01-04&rft.volume=144&rft.spage=107132&rft.pages=107132-&rft.artnum=107132&rft.issn=0098-1354&rft.eissn=1873-4375&rft_id=info:doi/10.1016/j.compchemeng.2020.107132&rft_dat=%3Celsevier_cross%3ES0098135419313225%3C/elsevier_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c372t-52777b807ddc526b845368d6347a86d41fb2f671a6170acc3eb6535be107ced3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true