Loading…

Active metric learning for supervised classification

Clustering and classification critically rely on distance metrics that provide meaningful comparisons between data points. To this end, learning optimal distance functions from data, known as metric learning, aims to facilitate supervised classification, particularly in high-dimensional spaces where...

Full description

Saved in:
Bibliographic Details
Published in:Computers & chemical engineering 2021-01, Vol.144, p.107132, Article 107132
Main Authors: Kumaran, Krishnan, Papageorgiou, Dimitri J, Takac, Martin, Lueg, Laurens, Sahinidis, Nicolas V
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c372t-52777b807ddc526b845368d6347a86d41fb2f671a6170acc3eb6535be107ced3
cites cdi_FETCH-LOGICAL-c372t-52777b807ddc526b845368d6347a86d41fb2f671a6170acc3eb6535be107ced3
container_end_page
container_issue
container_start_page 107132
container_title Computers & chemical engineering
container_volume 144
creator Kumaran, Krishnan
Papageorgiou, Dimitri J
Takac, Martin
Lueg, Laurens
Sahinidis, Nicolas V
description Clustering and classification critically rely on distance metrics that provide meaningful comparisons between data points. To this end, learning optimal distance functions from data, known as metric learning, aims to facilitate supervised classification, particularly in high-dimensional spaces where visualization is challenging or infeasible. In particular, the Mahalanobis metric is the default choice due to simplicity and interpretability as a transformation of the simple Euclidean metric using a combination of rotation and scaling. In this work, we present several novel contributions to metric learning, both by way of formulation as well as solution methods. Our approach is motivated by agglomerative clustering with certain novel modifications that enable natural interpretation of the user-defined classes as clusters with the optimal metric. Our approach generalizes and improves upon leading methods by removing reliance on pre-designated “target neighbors,” “triplets,” and “similarity pairs.” Starting with the definition of a generalized metric that has the Mahalanobis metric as the second order term, we propose an objective function for metric selection that does not aim to isolate classes from each other like most previous work, but tries to distort the space minimally by aggregating co-class members into local clusters. Further, we formulate the problem as a mixed-integer optimization that can be solved efficiently for small/medium datasets and approximated for larger datasets. Another salient feature of our method is that it facilitates active learning by recommending precise regions to sample using the optimal metric to improve classification performance. These regions are indicated by boundary and outlier points of the dataset as defined by the metric. This targeted acquisition can significantly reduce computation and data acquisition by ensuring training data completeness, representativeness, and economy, which could also provide advantages in training data selection for other established methods like Deep Learning and Random Forests. We demonstrate classification and computational performance of our approach through several simple and intuitive examples, followed by results on real image and benchmark datasets.
doi_str_mv 10.1016/j.compchemeng.2020.107132
format article
fullrecord <record><control><sourceid>elsevier_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1016_j_compchemeng_2020_107132</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0098135419313225</els_id><sourcerecordid>S0098135419313225</sourcerecordid><originalsourceid>FETCH-LOGICAL-c372t-52777b807ddc526b845368d6347a86d41fb2f671a6170acc3eb6535be107ced3</originalsourceid><addsrcrecordid>eNqNj8tqwzAUREVpoWnaf3A_wK7eUpYh9AWBbrIX8tV1KhPbQXIN_fs6uIsuuxoYmMMcQh4ZrRhl-qmtYOjO8Ikd9seKU37pDRP8iqyYNaKUwqhrsqJ0Y0smlLwldzm3lFIurV0RuYUxTlh0OKYIxQl96mN_LJohFfnrjGmKGUMBJ59zbCL4MQ79Pblp_Cnjw2-uyeHl-bB7K_cfr--77b4EYfhYKm6MqS01IYDiurZSCW2DFtJ4q4NkTc0bbZjXzFAPILDWSqgaZwPAINZks2AhDTknbNw5xc6nb8eou9i71v2xdxd7t9jP292yxfnfFDG5DBH7GRsTwujCEP9B-QGGkWlD</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Active metric learning for supervised classification</title><source>Elsevier:Jisc Collections:Elsevier Read and Publish Agreement 2022-2024:Freedom Collection (Reading list)</source><creator>Kumaran, Krishnan ; Papageorgiou, Dimitri J ; Takac, Martin ; Lueg, Laurens ; Sahinidis, Nicolas V</creator><creatorcontrib>Kumaran, Krishnan ; Papageorgiou, Dimitri J ; Takac, Martin ; Lueg, Laurens ; Sahinidis, Nicolas V</creatorcontrib><description>Clustering and classification critically rely on distance metrics that provide meaningful comparisons between data points. To this end, learning optimal distance functions from data, known as metric learning, aims to facilitate supervised classification, particularly in high-dimensional spaces where visualization is challenging or infeasible. In particular, the Mahalanobis metric is the default choice due to simplicity and interpretability as a transformation of the simple Euclidean metric using a combination of rotation and scaling. In this work, we present several novel contributions to metric learning, both by way of formulation as well as solution methods. Our approach is motivated by agglomerative clustering with certain novel modifications that enable natural interpretation of the user-defined classes as clusters with the optimal metric. Our approach generalizes and improves upon leading methods by removing reliance on pre-designated “target neighbors,” “triplets,” and “similarity pairs.” Starting with the definition of a generalized metric that has the Mahalanobis metric as the second order term, we propose an objective function for metric selection that does not aim to isolate classes from each other like most previous work, but tries to distort the space minimally by aggregating co-class members into local clusters. Further, we formulate the problem as a mixed-integer optimization that can be solved efficiently for small/medium datasets and approximated for larger datasets. Another salient feature of our method is that it facilitates active learning by recommending precise regions to sample using the optimal metric to improve classification performance. These regions are indicated by boundary and outlier points of the dataset as defined by the metric. This targeted acquisition can significantly reduce computation and data acquisition by ensuring training data completeness, representativeness, and economy, which could also provide advantages in training data selection for other established methods like Deep Learning and Random Forests. We demonstrate classification and computational performance of our approach through several simple and intuitive examples, followed by results on real image and benchmark datasets.</description><identifier>ISSN: 0098-1354</identifier><identifier>EISSN: 1873-4375</identifier><identifier>DOI: 10.1016/j.compchemeng.2020.107132</identifier><language>eng</language><publisher>Elsevier Ltd</publisher><subject>Active learning ; Clustering ; Metric learning ; Mixed-integer optimization</subject><ispartof>Computers &amp; chemical engineering, 2021-01, Vol.144, p.107132, Article 107132</ispartof><rights>2020</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c372t-52777b807ddc526b845368d6347a86d41fb2f671a6170acc3eb6535be107ced3</citedby><cites>FETCH-LOGICAL-c372t-52777b807ddc526b845368d6347a86d41fb2f671a6170acc3eb6535be107ced3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids></links><search><creatorcontrib>Kumaran, Krishnan</creatorcontrib><creatorcontrib>Papageorgiou, Dimitri J</creatorcontrib><creatorcontrib>Takac, Martin</creatorcontrib><creatorcontrib>Lueg, Laurens</creatorcontrib><creatorcontrib>Sahinidis, Nicolas V</creatorcontrib><title>Active metric learning for supervised classification</title><title>Computers &amp; chemical engineering</title><description>Clustering and classification critically rely on distance metrics that provide meaningful comparisons between data points. To this end, learning optimal distance functions from data, known as metric learning, aims to facilitate supervised classification, particularly in high-dimensional spaces where visualization is challenging or infeasible. In particular, the Mahalanobis metric is the default choice due to simplicity and interpretability as a transformation of the simple Euclidean metric using a combination of rotation and scaling. In this work, we present several novel contributions to metric learning, both by way of formulation as well as solution methods. Our approach is motivated by agglomerative clustering with certain novel modifications that enable natural interpretation of the user-defined classes as clusters with the optimal metric. Our approach generalizes and improves upon leading methods by removing reliance on pre-designated “target neighbors,” “triplets,” and “similarity pairs.” Starting with the definition of a generalized metric that has the Mahalanobis metric as the second order term, we propose an objective function for metric selection that does not aim to isolate classes from each other like most previous work, but tries to distort the space minimally by aggregating co-class members into local clusters. Further, we formulate the problem as a mixed-integer optimization that can be solved efficiently for small/medium datasets and approximated for larger datasets. Another salient feature of our method is that it facilitates active learning by recommending precise regions to sample using the optimal metric to improve classification performance. These regions are indicated by boundary and outlier points of the dataset as defined by the metric. This targeted acquisition can significantly reduce computation and data acquisition by ensuring training data completeness, representativeness, and economy, which could also provide advantages in training data selection for other established methods like Deep Learning and Random Forests. We demonstrate classification and computational performance of our approach through several simple and intuitive examples, followed by results on real image and benchmark datasets.</description><subject>Active learning</subject><subject>Clustering</subject><subject>Metric learning</subject><subject>Mixed-integer optimization</subject><issn>0098-1354</issn><issn>1873-4375</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><recordid>eNqNj8tqwzAUREVpoWnaf3A_wK7eUpYh9AWBbrIX8tV1KhPbQXIN_fs6uIsuuxoYmMMcQh4ZrRhl-qmtYOjO8Ikd9seKU37pDRP8iqyYNaKUwqhrsqJ0Y0smlLwldzm3lFIurV0RuYUxTlh0OKYIxQl96mN_LJohFfnrjGmKGUMBJ59zbCL4MQ79Pblp_Cnjw2-uyeHl-bB7K_cfr--77b4EYfhYKm6MqS01IYDiurZSCW2DFtJ4q4NkTc0bbZjXzFAPILDWSqgaZwPAINZks2AhDTknbNw5xc6nb8eou9i71v2xdxd7t9jP292yxfnfFDG5DBH7GRsTwujCEP9B-QGGkWlD</recordid><startdate>20210104</startdate><enddate>20210104</enddate><creator>Kumaran, Krishnan</creator><creator>Papageorgiou, Dimitri J</creator><creator>Takac, Martin</creator><creator>Lueg, Laurens</creator><creator>Sahinidis, Nicolas V</creator><general>Elsevier Ltd</general><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>20210104</creationdate><title>Active metric learning for supervised classification</title><author>Kumaran, Krishnan ; Papageorgiou, Dimitri J ; Takac, Martin ; Lueg, Laurens ; Sahinidis, Nicolas V</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c372t-52777b807ddc526b845368d6347a86d41fb2f671a6170acc3eb6535be107ced3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Active learning</topic><topic>Clustering</topic><topic>Metric learning</topic><topic>Mixed-integer optimization</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Kumaran, Krishnan</creatorcontrib><creatorcontrib>Papageorgiou, Dimitri J</creatorcontrib><creatorcontrib>Takac, Martin</creatorcontrib><creatorcontrib>Lueg, Laurens</creatorcontrib><creatorcontrib>Sahinidis, Nicolas V</creatorcontrib><collection>CrossRef</collection><jtitle>Computers &amp; chemical engineering</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Kumaran, Krishnan</au><au>Papageorgiou, Dimitri J</au><au>Takac, Martin</au><au>Lueg, Laurens</au><au>Sahinidis, Nicolas V</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Active metric learning for supervised classification</atitle><jtitle>Computers &amp; chemical engineering</jtitle><date>2021-01-04</date><risdate>2021</risdate><volume>144</volume><spage>107132</spage><pages>107132-</pages><artnum>107132</artnum><issn>0098-1354</issn><eissn>1873-4375</eissn><abstract>Clustering and classification critically rely on distance metrics that provide meaningful comparisons between data points. To this end, learning optimal distance functions from data, known as metric learning, aims to facilitate supervised classification, particularly in high-dimensional spaces where visualization is challenging or infeasible. In particular, the Mahalanobis metric is the default choice due to simplicity and interpretability as a transformation of the simple Euclidean metric using a combination of rotation and scaling. In this work, we present several novel contributions to metric learning, both by way of formulation as well as solution methods. Our approach is motivated by agglomerative clustering with certain novel modifications that enable natural interpretation of the user-defined classes as clusters with the optimal metric. Our approach generalizes and improves upon leading methods by removing reliance on pre-designated “target neighbors,” “triplets,” and “similarity pairs.” Starting with the definition of a generalized metric that has the Mahalanobis metric as the second order term, we propose an objective function for metric selection that does not aim to isolate classes from each other like most previous work, but tries to distort the space minimally by aggregating co-class members into local clusters. Further, we formulate the problem as a mixed-integer optimization that can be solved efficiently for small/medium datasets and approximated for larger datasets. Another salient feature of our method is that it facilitates active learning by recommending precise regions to sample using the optimal metric to improve classification performance. These regions are indicated by boundary and outlier points of the dataset as defined by the metric. This targeted acquisition can significantly reduce computation and data acquisition by ensuring training data completeness, representativeness, and economy, which could also provide advantages in training data selection for other established methods like Deep Learning and Random Forests. We demonstrate classification and computational performance of our approach through several simple and intuitive examples, followed by results on real image and benchmark datasets.</abstract><pub>Elsevier Ltd</pub><doi>10.1016/j.compchemeng.2020.107132</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 0098-1354
ispartof Computers & chemical engineering, 2021-01, Vol.144, p.107132, Article 107132
issn 0098-1354
1873-4375
language eng
recordid cdi_crossref_primary_10_1016_j_compchemeng_2020_107132
source Elsevier:Jisc Collections:Elsevier Read and Publish Agreement 2022-2024:Freedom Collection (Reading list)
subjects Active learning
Clustering
Metric learning
Mixed-integer optimization
title Active metric learning for supervised classification
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-03T09%3A02%3A24IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-elsevier_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Active%20metric%20learning%20for%20supervised%20classification&rft.jtitle=Computers%20&%20chemical%20engineering&rft.au=Kumaran,%20Krishnan&rft.date=2021-01-04&rft.volume=144&rft.spage=107132&rft.pages=107132-&rft.artnum=107132&rft.issn=0098-1354&rft.eissn=1873-4375&rft_id=info:doi/10.1016/j.compchemeng.2020.107132&rft_dat=%3Celsevier_cross%3ES0098135419313225%3C/elsevier_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c372t-52777b807ddc526b845368d6347a86d41fb2f671a6170acc3eb6535be107ced3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true