Loading…

BenchMetrics: a systematic benchmarking method for binary classification performance metrics

This paper proposes a systematic benchmarking method called BenchMetrics to analyze and compare the robustness of binary classification performance metrics based on the confusion matrix for a crisp classifier. BenchMetrics, introducing new concepts such as meta-metrics (metrics about metrics) and me...

Full description

Saved in:
Bibliographic Details
Published in:Neural computing & applications 2021-11, Vol.33 (21), p.14623-14650
Main Authors: Canbek, Gürol, Taskaya Temizel, Tugba, Sagiroglu, Seref
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c319t-3b8c7db23b0fdba8ec2f9dc2c08b3ac7ebd099d996a96df32532a930882194473
cites cdi_FETCH-LOGICAL-c319t-3b8c7db23b0fdba8ec2f9dc2c08b3ac7ebd099d996a96df32532a930882194473
container_end_page 14650
container_issue 21
container_start_page 14623
container_title Neural computing & applications
container_volume 33
creator Canbek, Gürol
Taskaya Temizel, Tugba
Sagiroglu, Seref
description This paper proposes a systematic benchmarking method called BenchMetrics to analyze and compare the robustness of binary classification performance metrics based on the confusion matrix for a crisp classifier. BenchMetrics, introducing new concepts such as meta-metrics (metrics about metrics) and metric space, has been tested on fifteen well-known metrics including balanced accuracy, normalized mutual information, Cohen’s Kappa, and Matthews correlation coefficient (MCC), along with two recently proposed metrics, optimized precision and index of balanced accuracy in the literature. The method formally presents a pseudo-universal metric space where all the permutations of confusion matrix elements yielding the same sample size are calculated. It evaluates the metrics and metric spaces in a two-staged benchmark based on our proposed eighteen new criteria and finally ranks the metrics by aggregating the criteria results. The mathematical evaluation stage analyzes metrics’ equations, specific confusion matrix variations, and corresponding metric spaces. The second stage, including seven novel meta-metrics, evaluates the robustness aspects of metric spaces. We interpreted each benchmarking result and comparatively assessed the effectiveness of BenchMetrics with the limited comparison studies in the literature. The results of BenchMetrics have demonstrated that widely used metrics have significant robustness issues, and MCC is the most robust and recommended metric for binary classification performance evaluation.
doi_str_mv 10.1007/s00521-021-06103-6
format article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2585228537</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2585228537</sourcerecordid><originalsourceid>FETCH-LOGICAL-c319t-3b8c7db23b0fdba8ec2f9dc2c08b3ac7ebd099d996a96df32532a930882194473</originalsourceid><addsrcrecordid>eNp9kEtPwzAQhC0EEqXwBzhZ4hxYe_OwuUHFSyriAjcky6-0KU1S7PTQf0-sIHHjsNrDfDOrHUIuGVwzgOomAhScZZCmZIBZeURmLEfMEApxTGYg8yTleErOYtwAQF6KYkY-731n169-CI2Nt1TTeIiDb_XQWGqS1Orw1XQr2vph3Tta94GaptPhQO1Wx9jUjR3hvqM7H0ax1Z31CU555-Sk1tvoL373nHw8PrwvnrPl29PL4m6ZWWRyyNAIWznD0UDtjBbe8lo6yy0Ig9pW3jiQ0klZalm6GnmBXEsEITiTeV7hnFxNubvQf-99HNSm34duPKl4IQrORYGJ4hNlQx9j8LXahWZ876AYqNSimlpUkCa1qMrRhJMpjnC38uEv-h_XD7R6dnM</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2585228537</pqid></control><display><type>article</type><title>BenchMetrics: a systematic benchmarking method for binary classification performance metrics</title><source>Springer Nature:Jisc Collections:Springer Nature Read and Publish 2023-2025: Springer Reading List</source><creator>Canbek, Gürol ; Taskaya Temizel, Tugba ; Sagiroglu, Seref</creator><creatorcontrib>Canbek, Gürol ; Taskaya Temizel, Tugba ; Sagiroglu, Seref</creatorcontrib><description>This paper proposes a systematic benchmarking method called BenchMetrics to analyze and compare the robustness of binary classification performance metrics based on the confusion matrix for a crisp classifier. BenchMetrics, introducing new concepts such as meta-metrics (metrics about metrics) and metric space, has been tested on fifteen well-known metrics including balanced accuracy, normalized mutual information, Cohen’s Kappa, and Matthews correlation coefficient (MCC), along with two recently proposed metrics, optimized precision and index of balanced accuracy in the literature. The method formally presents a pseudo-universal metric space where all the permutations of confusion matrix elements yielding the same sample size are calculated. It evaluates the metrics and metric spaces in a two-staged benchmark based on our proposed eighteen new criteria and finally ranks the metrics by aggregating the criteria results. The mathematical evaluation stage analyzes metrics’ equations, specific confusion matrix variations, and corresponding metric spaces. The second stage, including seven novel meta-metrics, evaluates the robustness aspects of metric spaces. We interpreted each benchmarking result and comparatively assessed the effectiveness of BenchMetrics with the limited comparison studies in the literature. The results of BenchMetrics have demonstrated that widely used metrics have significant robustness issues, and MCC is the most robust and recommended metric for binary classification performance evaluation.</description><identifier>ISSN: 0941-0643</identifier><identifier>EISSN: 1433-3058</identifier><identifier>DOI: 10.1007/s00521-021-06103-6</identifier><language>eng</language><publisher>London: Springer London</publisher><subject>Artificial Intelligence ; Benchmarks ; Business metrics ; Classification ; Computational Biology/Bioinformatics ; Computational Science and Engineering ; Computer Science ; Confusion ; Correlation coefficients ; Criteria ; Data Mining and Knowledge Discovery ; Image Processing and Computer Vision ; Mathematical analysis ; Metric space ; Original Article ; Performance evaluation ; Performance measurement ; Permutations ; Probability and Statistics in Computer Science ; Robustness (mathematics)</subject><ispartof>Neural computing &amp; applications, 2021-11, Vol.33 (21), p.14623-14650</ispartof><rights>The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2021</rights><rights>The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2021.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c319t-3b8c7db23b0fdba8ec2f9dc2c08b3ac7ebd099d996a96df32532a930882194473</citedby><cites>FETCH-LOGICAL-c319t-3b8c7db23b0fdba8ec2f9dc2c08b3ac7ebd099d996a96df32532a930882194473</cites><orcidid>0000-0003-0805-5818 ; 0000-0001-7387-8621 ; 0000-0002-9337-097X</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27922,27923</link.rule.ids></links><search><creatorcontrib>Canbek, Gürol</creatorcontrib><creatorcontrib>Taskaya Temizel, Tugba</creatorcontrib><creatorcontrib>Sagiroglu, Seref</creatorcontrib><title>BenchMetrics: a systematic benchmarking method for binary classification performance metrics</title><title>Neural computing &amp; applications</title><addtitle>Neural Comput &amp; Applic</addtitle><description>This paper proposes a systematic benchmarking method called BenchMetrics to analyze and compare the robustness of binary classification performance metrics based on the confusion matrix for a crisp classifier. BenchMetrics, introducing new concepts such as meta-metrics (metrics about metrics) and metric space, has been tested on fifteen well-known metrics including balanced accuracy, normalized mutual information, Cohen’s Kappa, and Matthews correlation coefficient (MCC), along with two recently proposed metrics, optimized precision and index of balanced accuracy in the literature. The method formally presents a pseudo-universal metric space where all the permutations of confusion matrix elements yielding the same sample size are calculated. It evaluates the metrics and metric spaces in a two-staged benchmark based on our proposed eighteen new criteria and finally ranks the metrics by aggregating the criteria results. The mathematical evaluation stage analyzes metrics’ equations, specific confusion matrix variations, and corresponding metric spaces. The second stage, including seven novel meta-metrics, evaluates the robustness aspects of metric spaces. We interpreted each benchmarking result and comparatively assessed the effectiveness of BenchMetrics with the limited comparison studies in the literature. The results of BenchMetrics have demonstrated that widely used metrics have significant robustness issues, and MCC is the most robust and recommended metric for binary classification performance evaluation.</description><subject>Artificial Intelligence</subject><subject>Benchmarks</subject><subject>Business metrics</subject><subject>Classification</subject><subject>Computational Biology/Bioinformatics</subject><subject>Computational Science and Engineering</subject><subject>Computer Science</subject><subject>Confusion</subject><subject>Correlation coefficients</subject><subject>Criteria</subject><subject>Data Mining and Knowledge Discovery</subject><subject>Image Processing and Computer Vision</subject><subject>Mathematical analysis</subject><subject>Metric space</subject><subject>Original Article</subject><subject>Performance evaluation</subject><subject>Performance measurement</subject><subject>Permutations</subject><subject>Probability and Statistics in Computer Science</subject><subject>Robustness (mathematics)</subject><issn>0941-0643</issn><issn>1433-3058</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><recordid>eNp9kEtPwzAQhC0EEqXwBzhZ4hxYe_OwuUHFSyriAjcky6-0KU1S7PTQf0-sIHHjsNrDfDOrHUIuGVwzgOomAhScZZCmZIBZeURmLEfMEApxTGYg8yTleErOYtwAQF6KYkY-731n169-CI2Nt1TTeIiDb_XQWGqS1Orw1XQr2vph3Tta94GaptPhQO1Wx9jUjR3hvqM7H0ax1Z31CU555-Sk1tvoL373nHw8PrwvnrPl29PL4m6ZWWRyyNAIWznD0UDtjBbe8lo6yy0Ig9pW3jiQ0klZalm6GnmBXEsEITiTeV7hnFxNubvQf-99HNSm34duPKl4IQrORYGJ4hNlQx9j8LXahWZ876AYqNSimlpUkCa1qMrRhJMpjnC38uEv-h_XD7R6dnM</recordid><startdate>20211101</startdate><enddate>20211101</enddate><creator>Canbek, Gürol</creator><creator>Taskaya Temizel, Tugba</creator><creator>Sagiroglu, Seref</creator><general>Springer London</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>8FE</scope><scope>8FG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>P5Z</scope><scope>P62</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><orcidid>https://orcid.org/0000-0003-0805-5818</orcidid><orcidid>https://orcid.org/0000-0001-7387-8621</orcidid><orcidid>https://orcid.org/0000-0002-9337-097X</orcidid></search><sort><creationdate>20211101</creationdate><title>BenchMetrics: a systematic benchmarking method for binary classification performance metrics</title><author>Canbek, Gürol ; Taskaya Temizel, Tugba ; Sagiroglu, Seref</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c319t-3b8c7db23b0fdba8ec2f9dc2c08b3ac7ebd099d996a96df32532a930882194473</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Artificial Intelligence</topic><topic>Benchmarks</topic><topic>Business metrics</topic><topic>Classification</topic><topic>Computational Biology/Bioinformatics</topic><topic>Computational Science and Engineering</topic><topic>Computer Science</topic><topic>Confusion</topic><topic>Correlation coefficients</topic><topic>Criteria</topic><topic>Data Mining and Knowledge Discovery</topic><topic>Image Processing and Computer Vision</topic><topic>Mathematical analysis</topic><topic>Metric space</topic><topic>Original Article</topic><topic>Performance evaluation</topic><topic>Performance measurement</topic><topic>Permutations</topic><topic>Probability and Statistics in Computer Science</topic><topic>Robustness (mathematics)</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Canbek, Gürol</creatorcontrib><creatorcontrib>Taskaya Temizel, Tugba</creatorcontrib><creatorcontrib>Sagiroglu, Seref</creatorcontrib><collection>CrossRef</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>Advanced Technologies &amp; Aerospace Database</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><jtitle>Neural computing &amp; applications</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Canbek, Gürol</au><au>Taskaya Temizel, Tugba</au><au>Sagiroglu, Seref</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>BenchMetrics: a systematic benchmarking method for binary classification performance metrics</atitle><jtitle>Neural computing &amp; applications</jtitle><stitle>Neural Comput &amp; Applic</stitle><date>2021-11-01</date><risdate>2021</risdate><volume>33</volume><issue>21</issue><spage>14623</spage><epage>14650</epage><pages>14623-14650</pages><issn>0941-0643</issn><eissn>1433-3058</eissn><abstract>This paper proposes a systematic benchmarking method called BenchMetrics to analyze and compare the robustness of binary classification performance metrics based on the confusion matrix for a crisp classifier. BenchMetrics, introducing new concepts such as meta-metrics (metrics about metrics) and metric space, has been tested on fifteen well-known metrics including balanced accuracy, normalized mutual information, Cohen’s Kappa, and Matthews correlation coefficient (MCC), along with two recently proposed metrics, optimized precision and index of balanced accuracy in the literature. The method formally presents a pseudo-universal metric space where all the permutations of confusion matrix elements yielding the same sample size are calculated. It evaluates the metrics and metric spaces in a two-staged benchmark based on our proposed eighteen new criteria and finally ranks the metrics by aggregating the criteria results. The mathematical evaluation stage analyzes metrics’ equations, specific confusion matrix variations, and corresponding metric spaces. The second stage, including seven novel meta-metrics, evaluates the robustness aspects of metric spaces. We interpreted each benchmarking result and comparatively assessed the effectiveness of BenchMetrics with the limited comparison studies in the literature. The results of BenchMetrics have demonstrated that widely used metrics have significant robustness issues, and MCC is the most robust and recommended metric for binary classification performance evaluation.</abstract><cop>London</cop><pub>Springer London</pub><doi>10.1007/s00521-021-06103-6</doi><tpages>28</tpages><orcidid>https://orcid.org/0000-0003-0805-5818</orcidid><orcidid>https://orcid.org/0000-0001-7387-8621</orcidid><orcidid>https://orcid.org/0000-0002-9337-097X</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 0941-0643
ispartof Neural computing & applications, 2021-11, Vol.33 (21), p.14623-14650
issn 0941-0643
1433-3058
language eng
recordid cdi_proquest_journals_2585228537
source Springer Nature:Jisc Collections:Springer Nature Read and Publish 2023-2025: Springer Reading List
subjects Artificial Intelligence
Benchmarks
Business metrics
Classification
Computational Biology/Bioinformatics
Computational Science and Engineering
Computer Science
Confusion
Correlation coefficients
Criteria
Data Mining and Knowledge Discovery
Image Processing and Computer Vision
Mathematical analysis
Metric space
Original Article
Performance evaluation
Performance measurement
Permutations
Probability and Statistics in Computer Science
Robustness (mathematics)
title BenchMetrics: a systematic benchmarking method for binary classification performance metrics
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-14T10%3A31%3A50IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=BenchMetrics:%20a%20systematic%20benchmarking%20method%20for%20binary%20classification%20performance%20metrics&rft.jtitle=Neural%20computing%20&%20applications&rft.au=Canbek,%20G%C3%BCrol&rft.date=2021-11-01&rft.volume=33&rft.issue=21&rft.spage=14623&rft.epage=14650&rft.pages=14623-14650&rft.issn=0941-0643&rft.eissn=1433-3058&rft_id=info:doi/10.1007/s00521-021-06103-6&rft_dat=%3Cproquest_cross%3E2585228537%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c319t-3b8c7db23b0fdba8ec2f9dc2c08b3ac7ebd099d996a96df32532a930882194473%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2585228537&rft_id=info:pmid/&rfr_iscdi=true