Loading…
BenchMetrics: a systematic benchmarking method for binary classification performance metrics
This paper proposes a systematic benchmarking method called BenchMetrics to analyze and compare the robustness of binary classification performance metrics based on the confusion matrix for a crisp classifier. BenchMetrics, introducing new concepts such as meta-metrics (metrics about metrics) and me...
Saved in:
Published in: | Neural computing & applications 2021-11, Vol.33 (21), p.14623-14650 |
---|---|
Main Authors: | , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | cdi_FETCH-LOGICAL-c319t-3b8c7db23b0fdba8ec2f9dc2c08b3ac7ebd099d996a96df32532a930882194473 |
---|---|
cites | cdi_FETCH-LOGICAL-c319t-3b8c7db23b0fdba8ec2f9dc2c08b3ac7ebd099d996a96df32532a930882194473 |
container_end_page | 14650 |
container_issue | 21 |
container_start_page | 14623 |
container_title | Neural computing & applications |
container_volume | 33 |
creator | Canbek, Gürol Taskaya Temizel, Tugba Sagiroglu, Seref |
description | This paper proposes a systematic benchmarking method called BenchMetrics to analyze and compare the robustness of binary classification performance metrics based on the confusion matrix for a crisp classifier. BenchMetrics, introducing new concepts such as meta-metrics (metrics about metrics) and metric space, has been tested on fifteen well-known metrics including balanced accuracy, normalized mutual information, Cohen’s Kappa, and Matthews correlation coefficient (MCC), along with two recently proposed metrics, optimized precision and index of balanced accuracy in the literature. The method formally presents a pseudo-universal metric space where all the permutations of confusion matrix elements yielding the same sample size are calculated. It evaluates the metrics and metric spaces in a two-staged benchmark based on our proposed eighteen new criteria and finally ranks the metrics by aggregating the criteria results. The mathematical evaluation stage analyzes metrics’ equations, specific confusion matrix variations, and corresponding metric spaces. The second stage, including seven novel meta-metrics, evaluates the robustness aspects of metric spaces. We interpreted each benchmarking result and comparatively assessed the effectiveness of BenchMetrics with the limited comparison studies in the literature. The results of BenchMetrics have demonstrated that widely used metrics have significant robustness issues, and MCC is the most robust and recommended metric for binary classification performance evaluation. |
doi_str_mv | 10.1007/s00521-021-06103-6 |
format | article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2585228537</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2585228537</sourcerecordid><originalsourceid>FETCH-LOGICAL-c319t-3b8c7db23b0fdba8ec2f9dc2c08b3ac7ebd099d996a96df32532a930882194473</originalsourceid><addsrcrecordid>eNp9kEtPwzAQhC0EEqXwBzhZ4hxYe_OwuUHFSyriAjcky6-0KU1S7PTQf0-sIHHjsNrDfDOrHUIuGVwzgOomAhScZZCmZIBZeURmLEfMEApxTGYg8yTleErOYtwAQF6KYkY-731n169-CI2Nt1TTeIiDb_XQWGqS1Orw1XQr2vph3Tta94GaptPhQO1Wx9jUjR3hvqM7H0ax1Z31CU555-Sk1tvoL373nHw8PrwvnrPl29PL4m6ZWWRyyNAIWznD0UDtjBbe8lo6yy0Ig9pW3jiQ0klZalm6GnmBXEsEITiTeV7hnFxNubvQf-99HNSm34duPKl4IQrORYGJ4hNlQx9j8LXahWZ876AYqNSimlpUkCa1qMrRhJMpjnC38uEv-h_XD7R6dnM</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2585228537</pqid></control><display><type>article</type><title>BenchMetrics: a systematic benchmarking method for binary classification performance metrics</title><source>Springer Nature:Jisc Collections:Springer Nature Read and Publish 2023-2025: Springer Reading List</source><creator>Canbek, Gürol ; Taskaya Temizel, Tugba ; Sagiroglu, Seref</creator><creatorcontrib>Canbek, Gürol ; Taskaya Temizel, Tugba ; Sagiroglu, Seref</creatorcontrib><description>This paper proposes a systematic benchmarking method called BenchMetrics to analyze and compare the robustness of binary classification performance metrics based on the confusion matrix for a crisp classifier. BenchMetrics, introducing new concepts such as meta-metrics (metrics about metrics) and metric space, has been tested on fifteen well-known metrics including balanced accuracy, normalized mutual information, Cohen’s Kappa, and Matthews correlation coefficient (MCC), along with two recently proposed metrics, optimized precision and index of balanced accuracy in the literature. The method formally presents a pseudo-universal metric space where all the permutations of confusion matrix elements yielding the same sample size are calculated. It evaluates the metrics and metric spaces in a two-staged benchmark based on our proposed eighteen new criteria and finally ranks the metrics by aggregating the criteria results. The mathematical evaluation stage analyzes metrics’ equations, specific confusion matrix variations, and corresponding metric spaces. The second stage, including seven novel meta-metrics, evaluates the robustness aspects of metric spaces. We interpreted each benchmarking result and comparatively assessed the effectiveness of BenchMetrics with the limited comparison studies in the literature. The results of BenchMetrics have demonstrated that widely used metrics have significant robustness issues, and MCC is the most robust and recommended metric for binary classification performance evaluation.</description><identifier>ISSN: 0941-0643</identifier><identifier>EISSN: 1433-3058</identifier><identifier>DOI: 10.1007/s00521-021-06103-6</identifier><language>eng</language><publisher>London: Springer London</publisher><subject>Artificial Intelligence ; Benchmarks ; Business metrics ; Classification ; Computational Biology/Bioinformatics ; Computational Science and Engineering ; Computer Science ; Confusion ; Correlation coefficients ; Criteria ; Data Mining and Knowledge Discovery ; Image Processing and Computer Vision ; Mathematical analysis ; Metric space ; Original Article ; Performance evaluation ; Performance measurement ; Permutations ; Probability and Statistics in Computer Science ; Robustness (mathematics)</subject><ispartof>Neural computing & applications, 2021-11, Vol.33 (21), p.14623-14650</ispartof><rights>The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2021</rights><rights>The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2021.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c319t-3b8c7db23b0fdba8ec2f9dc2c08b3ac7ebd099d996a96df32532a930882194473</citedby><cites>FETCH-LOGICAL-c319t-3b8c7db23b0fdba8ec2f9dc2c08b3ac7ebd099d996a96df32532a930882194473</cites><orcidid>0000-0003-0805-5818 ; 0000-0001-7387-8621 ; 0000-0002-9337-097X</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27922,27923</link.rule.ids></links><search><creatorcontrib>Canbek, Gürol</creatorcontrib><creatorcontrib>Taskaya Temizel, Tugba</creatorcontrib><creatorcontrib>Sagiroglu, Seref</creatorcontrib><title>BenchMetrics: a systematic benchmarking method for binary classification performance metrics</title><title>Neural computing & applications</title><addtitle>Neural Comput & Applic</addtitle><description>This paper proposes a systematic benchmarking method called BenchMetrics to analyze and compare the robustness of binary classification performance metrics based on the confusion matrix for a crisp classifier. BenchMetrics, introducing new concepts such as meta-metrics (metrics about metrics) and metric space, has been tested on fifteen well-known metrics including balanced accuracy, normalized mutual information, Cohen’s Kappa, and Matthews correlation coefficient (MCC), along with two recently proposed metrics, optimized precision and index of balanced accuracy in the literature. The method formally presents a pseudo-universal metric space where all the permutations of confusion matrix elements yielding the same sample size are calculated. It evaluates the metrics and metric spaces in a two-staged benchmark based on our proposed eighteen new criteria and finally ranks the metrics by aggregating the criteria results. The mathematical evaluation stage analyzes metrics’ equations, specific confusion matrix variations, and corresponding metric spaces. The second stage, including seven novel meta-metrics, evaluates the robustness aspects of metric spaces. We interpreted each benchmarking result and comparatively assessed the effectiveness of BenchMetrics with the limited comparison studies in the literature. The results of BenchMetrics have demonstrated that widely used metrics have significant robustness issues, and MCC is the most robust and recommended metric for binary classification performance evaluation.</description><subject>Artificial Intelligence</subject><subject>Benchmarks</subject><subject>Business metrics</subject><subject>Classification</subject><subject>Computational Biology/Bioinformatics</subject><subject>Computational Science and Engineering</subject><subject>Computer Science</subject><subject>Confusion</subject><subject>Correlation coefficients</subject><subject>Criteria</subject><subject>Data Mining and Knowledge Discovery</subject><subject>Image Processing and Computer Vision</subject><subject>Mathematical analysis</subject><subject>Metric space</subject><subject>Original Article</subject><subject>Performance evaluation</subject><subject>Performance measurement</subject><subject>Permutations</subject><subject>Probability and Statistics in Computer Science</subject><subject>Robustness (mathematics)</subject><issn>0941-0643</issn><issn>1433-3058</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><recordid>eNp9kEtPwzAQhC0EEqXwBzhZ4hxYe_OwuUHFSyriAjcky6-0KU1S7PTQf0-sIHHjsNrDfDOrHUIuGVwzgOomAhScZZCmZIBZeURmLEfMEApxTGYg8yTleErOYtwAQF6KYkY-731n169-CI2Nt1TTeIiDb_XQWGqS1Orw1XQr2vph3Tta94GaptPhQO1Wx9jUjR3hvqM7H0ax1Z31CU555-Sk1tvoL373nHw8PrwvnrPl29PL4m6ZWWRyyNAIWznD0UDtjBbe8lo6yy0Ig9pW3jiQ0klZalm6GnmBXEsEITiTeV7hnFxNubvQf-99HNSm34duPKl4IQrORYGJ4hNlQx9j8LXahWZ876AYqNSimlpUkCa1qMrRhJMpjnC38uEv-h_XD7R6dnM</recordid><startdate>20211101</startdate><enddate>20211101</enddate><creator>Canbek, Gürol</creator><creator>Taskaya Temizel, Tugba</creator><creator>Sagiroglu, Seref</creator><general>Springer London</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>8FE</scope><scope>8FG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>P5Z</scope><scope>P62</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><orcidid>https://orcid.org/0000-0003-0805-5818</orcidid><orcidid>https://orcid.org/0000-0001-7387-8621</orcidid><orcidid>https://orcid.org/0000-0002-9337-097X</orcidid></search><sort><creationdate>20211101</creationdate><title>BenchMetrics: a systematic benchmarking method for binary classification performance metrics</title><author>Canbek, Gürol ; Taskaya Temizel, Tugba ; Sagiroglu, Seref</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c319t-3b8c7db23b0fdba8ec2f9dc2c08b3ac7ebd099d996a96df32532a930882194473</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Artificial Intelligence</topic><topic>Benchmarks</topic><topic>Business metrics</topic><topic>Classification</topic><topic>Computational Biology/Bioinformatics</topic><topic>Computational Science and Engineering</topic><topic>Computer Science</topic><topic>Confusion</topic><topic>Correlation coefficients</topic><topic>Criteria</topic><topic>Data Mining and Knowledge Discovery</topic><topic>Image Processing and Computer Vision</topic><topic>Mathematical analysis</topic><topic>Metric space</topic><topic>Original Article</topic><topic>Performance evaluation</topic><topic>Performance measurement</topic><topic>Permutations</topic><topic>Probability and Statistics in Computer Science</topic><topic>Robustness (mathematics)</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Canbek, Gürol</creatorcontrib><creatorcontrib>Taskaya Temizel, Tugba</creatorcontrib><creatorcontrib>Sagiroglu, Seref</creatorcontrib><collection>CrossRef</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><jtitle>Neural computing & applications</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Canbek, Gürol</au><au>Taskaya Temizel, Tugba</au><au>Sagiroglu, Seref</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>BenchMetrics: a systematic benchmarking method for binary classification performance metrics</atitle><jtitle>Neural computing & applications</jtitle><stitle>Neural Comput & Applic</stitle><date>2021-11-01</date><risdate>2021</risdate><volume>33</volume><issue>21</issue><spage>14623</spage><epage>14650</epage><pages>14623-14650</pages><issn>0941-0643</issn><eissn>1433-3058</eissn><abstract>This paper proposes a systematic benchmarking method called BenchMetrics to analyze and compare the robustness of binary classification performance metrics based on the confusion matrix for a crisp classifier. BenchMetrics, introducing new concepts such as meta-metrics (metrics about metrics) and metric space, has been tested on fifteen well-known metrics including balanced accuracy, normalized mutual information, Cohen’s Kappa, and Matthews correlation coefficient (MCC), along with two recently proposed metrics, optimized precision and index of balanced accuracy in the literature. The method formally presents a pseudo-universal metric space where all the permutations of confusion matrix elements yielding the same sample size are calculated. It evaluates the metrics and metric spaces in a two-staged benchmark based on our proposed eighteen new criteria and finally ranks the metrics by aggregating the criteria results. The mathematical evaluation stage analyzes metrics’ equations, specific confusion matrix variations, and corresponding metric spaces. The second stage, including seven novel meta-metrics, evaluates the robustness aspects of metric spaces. We interpreted each benchmarking result and comparatively assessed the effectiveness of BenchMetrics with the limited comparison studies in the literature. The results of BenchMetrics have demonstrated that widely used metrics have significant robustness issues, and MCC is the most robust and recommended metric for binary classification performance evaluation.</abstract><cop>London</cop><pub>Springer London</pub><doi>10.1007/s00521-021-06103-6</doi><tpages>28</tpages><orcidid>https://orcid.org/0000-0003-0805-5818</orcidid><orcidid>https://orcid.org/0000-0001-7387-8621</orcidid><orcidid>https://orcid.org/0000-0002-9337-097X</orcidid></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0941-0643 |
ispartof | Neural computing & applications, 2021-11, Vol.33 (21), p.14623-14650 |
issn | 0941-0643 1433-3058 |
language | eng |
recordid | cdi_proquest_journals_2585228537 |
source | Springer Nature:Jisc Collections:Springer Nature Read and Publish 2023-2025: Springer Reading List |
subjects | Artificial Intelligence Benchmarks Business metrics Classification Computational Biology/Bioinformatics Computational Science and Engineering Computer Science Confusion Correlation coefficients Criteria Data Mining and Knowledge Discovery Image Processing and Computer Vision Mathematical analysis Metric space Original Article Performance evaluation Performance measurement Permutations Probability and Statistics in Computer Science Robustness (mathematics) |
title | BenchMetrics: a systematic benchmarking method for binary classification performance metrics |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-14T10%3A31%3A50IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=BenchMetrics:%20a%20systematic%20benchmarking%20method%20for%20binary%20classification%20performance%20metrics&rft.jtitle=Neural%20computing%20&%20applications&rft.au=Canbek,%20G%C3%BCrol&rft.date=2021-11-01&rft.volume=33&rft.issue=21&rft.spage=14623&rft.epage=14650&rft.pages=14623-14650&rft.issn=0941-0643&rft.eissn=1433-3058&rft_id=info:doi/10.1007/s00521-021-06103-6&rft_dat=%3Cproquest_cross%3E2585228537%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c319t-3b8c7db23b0fdba8ec2f9dc2c08b3ac7ebd099d996a96df32532a930882194473%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2585228537&rft_id=info:pmid/&rfr_iscdi=true |