Loading…
Active metric learning for supervised classification
Clustering and classification critically rely on distance metrics that provide meaningful comparisons between data points. To this end, learning optimal distance functions from data, known as metric learning, aims to facilitate supervised classification, particularly in high-dimensional spaces where...
Saved in:
Published in: | Computers & chemical engineering 2021-01, Vol.144, p.107132, Article 107132 |
---|---|
Main Authors: | , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | cdi_FETCH-LOGICAL-c372t-52777b807ddc526b845368d6347a86d41fb2f671a6170acc3eb6535be107ced3 |
---|---|
cites | cdi_FETCH-LOGICAL-c372t-52777b807ddc526b845368d6347a86d41fb2f671a6170acc3eb6535be107ced3 |
container_end_page | |
container_issue | |
container_start_page | 107132 |
container_title | Computers & chemical engineering |
container_volume | 144 |
creator | Kumaran, Krishnan Papageorgiou, Dimitri J Takac, Martin Lueg, Laurens Sahinidis, Nicolas V |
description | Clustering and classification critically rely on distance metrics that provide meaningful comparisons between data points. To this end, learning optimal distance functions from data, known as metric learning, aims to facilitate supervised classification, particularly in high-dimensional spaces where visualization is challenging or infeasible. In particular, the Mahalanobis metric is the default choice due to simplicity and interpretability as a transformation of the simple Euclidean metric using a combination of rotation and scaling.
In this work, we present several novel contributions to metric learning, both by way of formulation as well as solution methods. Our approach is motivated by agglomerative clustering with certain novel modifications that enable natural interpretation of the user-defined classes as clusters with the optimal metric. Our approach generalizes and improves upon leading methods by removing reliance on pre-designated “target neighbors,” “triplets,” and “similarity pairs.” Starting with the definition of a generalized metric that has the Mahalanobis metric as the second order term, we propose an objective function for metric selection that does not aim to isolate classes from each other like most previous work, but tries to distort the space minimally by aggregating co-class members into local clusters. Further, we formulate the problem as a mixed-integer optimization that can be solved efficiently for small/medium datasets and approximated for larger datasets.
Another salient feature of our method is that it facilitates active learning by recommending precise regions to sample using the optimal metric to improve classification performance. These regions are indicated by boundary and outlier points of the dataset as defined by the metric. This targeted acquisition can significantly reduce computation and data acquisition by ensuring training data completeness, representativeness, and economy, which could also provide advantages in training data selection for other established methods like Deep Learning and Random Forests. We demonstrate classification and computational performance of our approach through several simple and intuitive examples, followed by results on real image and benchmark datasets. |
doi_str_mv | 10.1016/j.compchemeng.2020.107132 |
format | article |
fullrecord | <record><control><sourceid>elsevier_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1016_j_compchemeng_2020_107132</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0098135419313225</els_id><sourcerecordid>S0098135419313225</sourcerecordid><originalsourceid>FETCH-LOGICAL-c372t-52777b807ddc526b845368d6347a86d41fb2f671a6170acc3eb6535be107ced3</originalsourceid><addsrcrecordid>eNqNj8tqwzAUREVpoWnaf3A_wK7eUpYh9AWBbrIX8tV1KhPbQXIN_fs6uIsuuxoYmMMcQh4ZrRhl-qmtYOjO8Ikd9seKU37pDRP8iqyYNaKUwqhrsqJ0Y0smlLwldzm3lFIurV0RuYUxTlh0OKYIxQl96mN_LJohFfnrjGmKGUMBJ59zbCL4MQ79Pblp_Cnjw2-uyeHl-bB7K_cfr--77b4EYfhYKm6MqS01IYDiurZSCW2DFtJ4q4NkTc0bbZjXzFAPILDWSqgaZwPAINZks2AhDTknbNw5xc6nb8eou9i71v2xdxd7t9jP292yxfnfFDG5DBH7GRsTwujCEP9B-QGGkWlD</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Active metric learning for supervised classification</title><source>Elsevier:Jisc Collections:Elsevier Read and Publish Agreement 2022-2024:Freedom Collection (Reading list)</source><creator>Kumaran, Krishnan ; Papageorgiou, Dimitri J ; Takac, Martin ; Lueg, Laurens ; Sahinidis, Nicolas V</creator><creatorcontrib>Kumaran, Krishnan ; Papageorgiou, Dimitri J ; Takac, Martin ; Lueg, Laurens ; Sahinidis, Nicolas V</creatorcontrib><description>Clustering and classification critically rely on distance metrics that provide meaningful comparisons between data points. To this end, learning optimal distance functions from data, known as metric learning, aims to facilitate supervised classification, particularly in high-dimensional spaces where visualization is challenging or infeasible. In particular, the Mahalanobis metric is the default choice due to simplicity and interpretability as a transformation of the simple Euclidean metric using a combination of rotation and scaling.
In this work, we present several novel contributions to metric learning, both by way of formulation as well as solution methods. Our approach is motivated by agglomerative clustering with certain novel modifications that enable natural interpretation of the user-defined classes as clusters with the optimal metric. Our approach generalizes and improves upon leading methods by removing reliance on pre-designated “target neighbors,” “triplets,” and “similarity pairs.” Starting with the definition of a generalized metric that has the Mahalanobis metric as the second order term, we propose an objective function for metric selection that does not aim to isolate classes from each other like most previous work, but tries to distort the space minimally by aggregating co-class members into local clusters. Further, we formulate the problem as a mixed-integer optimization that can be solved efficiently for small/medium datasets and approximated for larger datasets.
Another salient feature of our method is that it facilitates active learning by recommending precise regions to sample using the optimal metric to improve classification performance. These regions are indicated by boundary and outlier points of the dataset as defined by the metric. This targeted acquisition can significantly reduce computation and data acquisition by ensuring training data completeness, representativeness, and economy, which could also provide advantages in training data selection for other established methods like Deep Learning and Random Forests. We demonstrate classification and computational performance of our approach through several simple and intuitive examples, followed by results on real image and benchmark datasets.</description><identifier>ISSN: 0098-1354</identifier><identifier>EISSN: 1873-4375</identifier><identifier>DOI: 10.1016/j.compchemeng.2020.107132</identifier><language>eng</language><publisher>Elsevier Ltd</publisher><subject>Active learning ; Clustering ; Metric learning ; Mixed-integer optimization</subject><ispartof>Computers & chemical engineering, 2021-01, Vol.144, p.107132, Article 107132</ispartof><rights>2020</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c372t-52777b807ddc526b845368d6347a86d41fb2f671a6170acc3eb6535be107ced3</citedby><cites>FETCH-LOGICAL-c372t-52777b807ddc526b845368d6347a86d41fb2f671a6170acc3eb6535be107ced3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids></links><search><creatorcontrib>Kumaran, Krishnan</creatorcontrib><creatorcontrib>Papageorgiou, Dimitri J</creatorcontrib><creatorcontrib>Takac, Martin</creatorcontrib><creatorcontrib>Lueg, Laurens</creatorcontrib><creatorcontrib>Sahinidis, Nicolas V</creatorcontrib><title>Active metric learning for supervised classification</title><title>Computers & chemical engineering</title><description>Clustering and classification critically rely on distance metrics that provide meaningful comparisons between data points. To this end, learning optimal distance functions from data, known as metric learning, aims to facilitate supervised classification, particularly in high-dimensional spaces where visualization is challenging or infeasible. In particular, the Mahalanobis metric is the default choice due to simplicity and interpretability as a transformation of the simple Euclidean metric using a combination of rotation and scaling.
In this work, we present several novel contributions to metric learning, both by way of formulation as well as solution methods. Our approach is motivated by agglomerative clustering with certain novel modifications that enable natural interpretation of the user-defined classes as clusters with the optimal metric. Our approach generalizes and improves upon leading methods by removing reliance on pre-designated “target neighbors,” “triplets,” and “similarity pairs.” Starting with the definition of a generalized metric that has the Mahalanobis metric as the second order term, we propose an objective function for metric selection that does not aim to isolate classes from each other like most previous work, but tries to distort the space minimally by aggregating co-class members into local clusters. Further, we formulate the problem as a mixed-integer optimization that can be solved efficiently for small/medium datasets and approximated for larger datasets.
Another salient feature of our method is that it facilitates active learning by recommending precise regions to sample using the optimal metric to improve classification performance. These regions are indicated by boundary and outlier points of the dataset as defined by the metric. This targeted acquisition can significantly reduce computation and data acquisition by ensuring training data completeness, representativeness, and economy, which could also provide advantages in training data selection for other established methods like Deep Learning and Random Forests. We demonstrate classification and computational performance of our approach through several simple and intuitive examples, followed by results on real image and benchmark datasets.</description><subject>Active learning</subject><subject>Clustering</subject><subject>Metric learning</subject><subject>Mixed-integer optimization</subject><issn>0098-1354</issn><issn>1873-4375</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><recordid>eNqNj8tqwzAUREVpoWnaf3A_wK7eUpYh9AWBbrIX8tV1KhPbQXIN_fs6uIsuuxoYmMMcQh4ZrRhl-qmtYOjO8Ikd9seKU37pDRP8iqyYNaKUwqhrsqJ0Y0smlLwldzm3lFIurV0RuYUxTlh0OKYIxQl96mN_LJohFfnrjGmKGUMBJ59zbCL4MQ79Pblp_Cnjw2-uyeHl-bB7K_cfr--77b4EYfhYKm6MqS01IYDiurZSCW2DFtJ4q4NkTc0bbZjXzFAPILDWSqgaZwPAINZks2AhDTknbNw5xc6nb8eou9i71v2xdxd7t9jP292yxfnfFDG5DBH7GRsTwujCEP9B-QGGkWlD</recordid><startdate>20210104</startdate><enddate>20210104</enddate><creator>Kumaran, Krishnan</creator><creator>Papageorgiou, Dimitri J</creator><creator>Takac, Martin</creator><creator>Lueg, Laurens</creator><creator>Sahinidis, Nicolas V</creator><general>Elsevier Ltd</general><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>20210104</creationdate><title>Active metric learning for supervised classification</title><author>Kumaran, Krishnan ; Papageorgiou, Dimitri J ; Takac, Martin ; Lueg, Laurens ; Sahinidis, Nicolas V</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c372t-52777b807ddc526b845368d6347a86d41fb2f671a6170acc3eb6535be107ced3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Active learning</topic><topic>Clustering</topic><topic>Metric learning</topic><topic>Mixed-integer optimization</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Kumaran, Krishnan</creatorcontrib><creatorcontrib>Papageorgiou, Dimitri J</creatorcontrib><creatorcontrib>Takac, Martin</creatorcontrib><creatorcontrib>Lueg, Laurens</creatorcontrib><creatorcontrib>Sahinidis, Nicolas V</creatorcontrib><collection>CrossRef</collection><jtitle>Computers & chemical engineering</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Kumaran, Krishnan</au><au>Papageorgiou, Dimitri J</au><au>Takac, Martin</au><au>Lueg, Laurens</au><au>Sahinidis, Nicolas V</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Active metric learning for supervised classification</atitle><jtitle>Computers & chemical engineering</jtitle><date>2021-01-04</date><risdate>2021</risdate><volume>144</volume><spage>107132</spage><pages>107132-</pages><artnum>107132</artnum><issn>0098-1354</issn><eissn>1873-4375</eissn><abstract>Clustering and classification critically rely on distance metrics that provide meaningful comparisons between data points. To this end, learning optimal distance functions from data, known as metric learning, aims to facilitate supervised classification, particularly in high-dimensional spaces where visualization is challenging or infeasible. In particular, the Mahalanobis metric is the default choice due to simplicity and interpretability as a transformation of the simple Euclidean metric using a combination of rotation and scaling.
In this work, we present several novel contributions to metric learning, both by way of formulation as well as solution methods. Our approach is motivated by agglomerative clustering with certain novel modifications that enable natural interpretation of the user-defined classes as clusters with the optimal metric. Our approach generalizes and improves upon leading methods by removing reliance on pre-designated “target neighbors,” “triplets,” and “similarity pairs.” Starting with the definition of a generalized metric that has the Mahalanobis metric as the second order term, we propose an objective function for metric selection that does not aim to isolate classes from each other like most previous work, but tries to distort the space minimally by aggregating co-class members into local clusters. Further, we formulate the problem as a mixed-integer optimization that can be solved efficiently for small/medium datasets and approximated for larger datasets.
Another salient feature of our method is that it facilitates active learning by recommending precise regions to sample using the optimal metric to improve classification performance. These regions are indicated by boundary and outlier points of the dataset as defined by the metric. This targeted acquisition can significantly reduce computation and data acquisition by ensuring training data completeness, representativeness, and economy, which could also provide advantages in training data selection for other established methods like Deep Learning and Random Forests. We demonstrate classification and computational performance of our approach through several simple and intuitive examples, followed by results on real image and benchmark datasets.</abstract><pub>Elsevier Ltd</pub><doi>10.1016/j.compchemeng.2020.107132</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0098-1354 |
ispartof | Computers & chemical engineering, 2021-01, Vol.144, p.107132, Article 107132 |
issn | 0098-1354 1873-4375 |
language | eng |
recordid | cdi_crossref_primary_10_1016_j_compchemeng_2020_107132 |
source | Elsevier:Jisc Collections:Elsevier Read and Publish Agreement 2022-2024:Freedom Collection (Reading list) |
subjects | Active learning Clustering Metric learning Mixed-integer optimization |
title | Active metric learning for supervised classification |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-03T09%3A02%3A24IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-elsevier_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Active%20metric%20learning%20for%20supervised%20classification&rft.jtitle=Computers%20&%20chemical%20engineering&rft.au=Kumaran,%20Krishnan&rft.date=2021-01-04&rft.volume=144&rft.spage=107132&rft.pages=107132-&rft.artnum=107132&rft.issn=0098-1354&rft.eissn=1873-4375&rft_id=info:doi/10.1016/j.compchemeng.2020.107132&rft_dat=%3Celsevier_cross%3ES0098135419313225%3C/elsevier_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c372t-52777b807ddc526b845368d6347a86d41fb2f671a6170acc3eb6535be107ced3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |