Loading…

BenchMetrics: a systematic benchmarking method for binary classification performance metrics

This paper proposes a systematic benchmarking method called BenchMetrics to analyze and compare the robustness of binary classification performance metrics based on the confusion matrix for a crisp classifier. BenchMetrics, introducing new concepts such as meta-metrics (metrics about metrics) and me...

Full description

Saved in:

Bibliographic Details
Published in:	Neural computing & applications 2021-11, Vol.33 (21), p.14623-14650
Main Authors:	Canbek, Gürol, Taskaya Temizel, Tugba, Sagiroglu, Seref
Format:	Article
Language:	English
Subjects:	Artificial Intelligence Benchmarks Business metrics Classification Computational Biology/Bioinformatics Computational Science and Engineering Computer Science Confusion Correlation coefficients Criteria Data Mining and Knowledge Discovery Image Processing and Computer Vision Mathematical analysis Metric space Original Article Performance evaluation Performance measurement Permutations Probability and Statistics in Computer Science Robustness (mathematics)
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by	cdi_FETCH-LOGICAL-c319t-3b8c7db23b0fdba8ec2f9dc2c08b3ac7ebd099d996a96df32532a930882194473
cites	cdi_FETCH-LOGICAL-c319t-3b8c7db23b0fdba8ec2f9dc2c08b3ac7ebd099d996a96df32532a930882194473
container_end_page	14650
container_issue	21
container_start_page	14623
container_title	Neural computing & applications
container_volume	33
creator	Canbek, Gürol Taskaya Temizel, Tugba Sagiroglu, Seref
description	This paper proposes a systematic benchmarking method called BenchMetrics to analyze and compare the robustness of binary classification performance metrics based on the confusion matrix for a crisp classifier. BenchMetrics, introducing new concepts such as meta-metrics (metrics about metrics) and metric space, has been tested on fifteen well-known metrics including balanced accuracy, normalized mutual information, Cohen’s Kappa, and Matthews correlation coefficient (MCC), along with two recently proposed metrics, optimized precision and index of balanced accuracy in the literature. The method formally presents a pseudo-universal metric space where all the permutations of confusion matrix elements yielding the same sample size are calculated. It evaluates the metrics and metric spaces in a two-staged benchmark based on our proposed eighteen new criteria and finally ranks the metrics by aggregating the criteria results. The mathematical evaluation stage analyzes metrics’ equations, specific confusion matrix variations, and corresponding metric spaces. The second stage, including seven novel meta-metrics, evaluates the robustness aspects of metric spaces. We interpreted each benchmarking result and comparatively assessed the effectiveness of BenchMetrics with the limited comparison studies in the literature. The results of BenchMetrics have demonstrated that widely used metrics have significant robustness issues, and MCC is the most robust and recommended metric for binary classification performance evaluation.
doi_str_mv	10.1007/s00521-021-06103-6
format	article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2585228537</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2585228537</sourcerecordid><originalsourceid>FETCH-LOGICAL-c319t-3b8c7db23b0fdba8ec2f9dc2c08b3ac7ebd099d996a96df32532a930882194473</originalsourceid><addsrcrecordid>eNp9kEtPwzAQhC0EEqXwBzhZ4hxYe_OwuUHFSyriAjcky6-0KU1S7PTQf0-sIHHjsNrDfDOrHUIuGVwzgOomAhScZZCmZIBZeURmLEfMEApxTGYg8yTleErOYtwAQF6KYkY-731n169-CI2Nt1TTeIiDb_XQWGqS1Orw1XQr2vph3Tta94GaptPhQO1Wx9jUjR3hvqM7H0ax1Z31CU555-Sk1tvoL373nHw8PrwvnrPl29PL4m6ZWWRyyNAIWznD0UDtjBbe8lo6yy0Ig9pW3jiQ0klZalm6GnmBXEsEITiTeV7hnFxNubvQf-99HNSm34duPKl4IQrORYGJ4hNlQx9j8LXahWZ876AYqNSimlpUkCa1qMrRhJMpjnC38uEv-h_XD7R6dnM</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2585228537</pqid></control><display><type>article</type><title>BenchMetrics: a systematic benchmarking method for binary classification performance metrics</title><source>Springer Nature:Jisc Collections:Springer Nature Read and Publish 2023-2025: Springer Reading List</source><creator>Canbek, Gürol ; Taskaya Temizel, Tugba ; Sagiroglu, Seref</creator><creatorcontrib>Canbek, Gürol ; Taskaya Temizel, Tugba ; Sagiroglu, Seref</creatorcontrib><description>This paper proposes a systematic benchmarking method called BenchMetrics to analyze and compare the robustness of binary classification performance metrics based on the confusion matrix for a crisp classifier. BenchMetrics, introducing new concepts such as meta-metrics (metrics about metrics) and metric space, has been tested on fifteen well-known metrics including balanced accuracy, normalized mutual information, Cohen’s Kappa, and Matthews correlation coefficient (MCC), along with two recently proposed metrics, optimized precision and index of balanced accuracy in the literature. The method formally presents a pseudo-universal metric space where all the permutations of confusion matrix elements yielding the same sample size are calculated. It evaluates the metrics and metric spaces in a two-staged benchmark based on our proposed eighteen new criteria and finally ranks the metrics by aggregating the criteria results. The mathematical evaluation stage analyzes metrics’ equations, specific confusion matrix variations, and corresponding metric spaces. The second stage, including seven novel meta-metrics, evaluates the robustness aspects of metric spaces. We interpreted each benchmarking result and comparatively assessed the effectiveness of BenchMetrics with the limited comparison studies in the literature. The results of BenchMetrics have demonstrated that widely used metrics have significant robustness issues, and MCC is the most robust and recommended metric for binary classification performance evaluation.</description><identifier>ISSN: 0941-0643</identifier><identifier>EISSN: 1433-3058</identifier><identifier>DOI: 10.1007/s00521-021-06103-6</identifier><language>eng</language><publisher>London: Springer London</publisher><subject>Artificial Intelligence ; Benchmarks ; Business metrics ; Classification ; Computational Biology/Bioinformatics ; Computational Science and Engineering ; Computer Science ; Confusion ; Correlation coefficients ; Criteria ; Data Mining and Knowledge Discovery ; Image Processing and Computer Vision ; Mathematical analysis ; Metric space ; Original Article ; Performance evaluation ; Performance measurement ; Permutations ; Probability and Statistics in Computer Science ; Robustness (mathematics)</subject><ispartof>Neural computing & applications, 2021-11, Vol.33 (21), p.14623-14650</ispartof><rights>The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2021</rights><rights>The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2021.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c319t-3b8c7db23b0fdba8ec2f9dc2c08b3ac7ebd099d996a96df32532a930882194473</citedby><cites>FETCH-LOGICAL-c319t-3b8c7db23b0fdba8ec2f9dc2c08b3ac7ebd099d996a96df32532a930882194473</cites><orcidid>0000-0003-0805-5818 ; 0000-0001-7387-8621 ; 0000-0002-9337-097X</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27922,27923</link.rule.ids></links><search><creatorcontrib>Canbek, Gürol</creatorcontrib><creatorcontrib>Taskaya Temizel, Tugba</creatorcontrib><creatorcontrib>Sagiroglu, Seref</creatorcontrib><title>BenchMetrics: a systematic benchmarking method for binary classification performance metrics</title><title>Neural computing & applications</title><addtitle>Neural Comput & Applic</addtitle><description>This paper proposes a systematic benchmarking method called BenchMetrics to analyze and compare the robustness of binary classification performance metrics based on the confusion matrix for a crisp classifier. BenchMetrics, introducing new concepts such as meta-metrics (metrics about metrics) and metric space, has been tested on fifteen well-known metrics including balanced accuracy, normalized mutual information, Cohen’s Kappa, and Matthews correlation coefficient (MCC), along with two recently proposed metrics, optimized precision and index of balanced accuracy in the literature. The method formally presents a pseudo-universal metric space where all the permutations of confusion matrix elements yielding the same sample size are calculated. It evaluates the metrics and metric spaces in a two-staged benchmark based on our proposed eighteen new criteria and finally ranks the metrics by aggregating the criteria results. The mathematical evaluation stage analyzes metrics’ equations, specific confusion matrix variations, and corresponding metric spaces. The second stage, including seven novel meta-metrics, evaluates the robustness aspects of metric spaces. We interpreted each benchmarking result and comparatively assessed the effectiveness of BenchMetrics with the limited comparison studies in the literature. The results of BenchMetrics have demonstrated that widely used metrics have significant robustness issues, and MCC is the most robust and recommended metric for binary classification performance evaluation.</description><subject>Artificial Intelligence</subject><subject>Benchmarks</subject><subject>Business metrics</subject><subject>Classification</subject><subject>Computational Biology/Bioinformatics</subject><subject>Computational Science and Engineering</subject><subject>Computer Science</subject><subject>Confusion</subject><subject>Correlation coefficients</subject><subject>Criteria</subject><subject>Data Mining and Knowledge Discovery</subject><subject>Image Processing and Computer Vision</subject><subject>Mathematical analysis</subject><subject>Metric space</subject><subject>Original Article</subject><subject>Performance evaluation</subject><subject>Performance measurement</subject><subject>Permutations</subject><subject>Probability and Statistics in Computer Science</subject><subject>Robustness (mathematics)</subject><issn>0941-0643</issn><issn>1433-3058</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><recordid>eNp9kEtPwzAQhC0EEqXwBzhZ4hxYe_OwuUHFSyriAjcky6-0KU1S7PTQf0-sIHHjsNrDfDOrHUIuGVwzgOomAhScZZCmZIBZeURmLEfMEApxTGYg8yTleErOYtwAQF6KYkY-731n169-CI2Nt1TTeIiDb_XQWGqS1Orw1XQr2vph3Tta94GaptPhQO1Wx9jUjR3hvqM7H0ax1Z31CU555-Sk1tvoL373nHw8PrwvnrPl29PL4m6ZWWRyyNAIWznD0UDtjBbe8lo6yy0Ig9pW3jiQ0klZalm6GnmBXEsEITiTeV7hnFxNubvQf-99HNSm34duPKl4IQrORYGJ4hNlQx9j8LXahWZ876AYqNSimlpUkCa1qMrRhJMpjnC38uEv-h_XD7R6dnM</recordid><startdate>20211101</startdate><enddate>20211101</enddate><creator>Canbek, Gürol</creator><creator>Taskaya Temizel, Tugba</creator><creator>Sagiroglu, Seref</creator><general>Springer London</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>8FE</scope><scope>8FG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>P5Z</scope><scope>P62</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><orcidid>https://orcid.org/0000-0003-0805-5818</orcidid><orcidid>https://orcid.org/0000-0001-7387-8621</orcidid><orcidid>https://orcid.org/0000-0002-9337-097X</orcidid></search><sort><creationdate>20211101</creationdate><title>BenchMetrics: a systematic benchmarking method for binary classification performance metrics</title><author>Canbek, Gürol ; Taskaya Temizel, Tugba ; Sagiroglu, Seref</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c319t-3b8c7db23b0fdba8ec2f9dc2c08b3ac7ebd099d996a96df32532a930882194473</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Artificial Intelligence</topic><topic>Benchmarks</topic><topic>Business metrics</topic><topic>Classification</topic><topic>Computational Biology/Bioinformatics</topic><topic>Computational Science and Engineering</topic><topic>Computer Science</topic><topic>Confusion</topic><topic>Correlation coefficients</topic><topic>Criteria</topic><topic>Data Mining and Knowledge Discovery</topic><topic>Image Processing and Computer Vision</topic><topic>Mathematical analysis</topic><topic>Metric space</topic><topic>Original Article</topic><topic>Performance evaluation</topic><topic>Performance measurement</topic><topic>Permutations</topic><topic>Probability and Statistics in Computer Science</topic><topic>Robustness (mathematics)</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Canbek, Gürol</creatorcontrib><creatorcontrib>Taskaya Temizel, Tugba</creatorcontrib><creatorcontrib>Sagiroglu, Seref</creatorcontrib><collection>CrossRef</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><jtitle>Neural computing & applications</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Canbek, Gürol</au><au>Taskaya Temizel, Tugba</au><au>Sagiroglu, Seref</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>BenchMetrics: a systematic benchmarking method for binary classification performance metrics</atitle><jtitle>Neural computing & applications</jtitle><stitle>Neural Comput & Applic</stitle><date>2021-11-01</date><risdate>2021</risdate><volume>33</volume><issue>21</issue><spage>14623</spage><epage>14650</epage><pages>14623-14650</pages><issn>0941-0643</issn><eissn>1433-3058</eissn><abstract>This paper proposes a systematic benchmarking method called BenchMetrics to analyze and compare the robustness of binary classification performance metrics based on the confusion matrix for a crisp classifier. BenchMetrics, introducing new concepts such as meta-metrics (metrics about metrics) and metric space, has been tested on fifteen well-known metrics including balanced accuracy, normalized mutual information, Cohen’s Kappa, and Matthews correlation coefficient (MCC), along with two recently proposed metrics, optimized precision and index of balanced accuracy in the literature. The method formally presents a pseudo-universal metric space where all the permutations of confusion matrix elements yielding the same sample size are calculated. It evaluates the metrics and metric spaces in a two-staged benchmark based on our proposed eighteen new criteria and finally ranks the metrics by aggregating the criteria results. The mathematical evaluation stage analyzes metrics’ equations, specific confusion matrix variations, and corresponding metric spaces. The second stage, including seven novel meta-metrics, evaluates the robustness aspects of metric spaces. We interpreted each benchmarking result and comparatively assessed the effectiveness of BenchMetrics with the limited comparison studies in the literature. The results of BenchMetrics have demonstrated that widely used metrics have significant robustness issues, and MCC is the most robust and recommended metric for binary classification performance evaluation.</abstract><cop>London</cop><pub>Springer London</pub><doi>10.1007/s00521-021-06103-6</doi><tpages>28</tpages><orcidid>https://orcid.org/0000-0003-0805-5818</orcidid><orcidid>https://orcid.org/0000-0001-7387-8621</orcidid><orcidid>https://orcid.org/0000-0002-9337-097X</orcidid></addata></record>
fulltext	fulltext
identifier	ISSN: 0941-0643
ispartof	Neural computing & applications, 2021-11, Vol.33 (21), p.14623-14650
issn	0941-0643 1433-3058
language	eng
recordid	cdi_proquest_journals_2585228537
source	Springer Nature:Jisc Collections:Springer Nature Read and Publish 2023-2025: Springer Reading List
subjects	Artificial Intelligence Benchmarks Business metrics Classification Computational Biology/Bioinformatics Computational Science and Engineering Computer Science Confusion Correlation coefficients Criteria Data Mining and Knowledge Discovery Image Processing and Computer Vision Mathematical analysis Metric space Original Article Performance evaluation Performance measurement Permutations Probability and Statistics in Computer Science Robustness (mathematics)
title	BenchMetrics: a systematic benchmarking method for binary classification performance metrics
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-14T10%3A31%3A50IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=BenchMetrics:%20a%20systematic%20benchmarking%20method%20for%20binary%20classification%20performance%20metrics&rft.jtitle=Neural%20computing%20&%20applications&rft.au=Canbek,%20G%C3%BCrol&rft.date=2021-11-01&rft.volume=33&rft.issue=21&rft.spage=14623&rft.epage=14650&rft.pages=14623-14650&rft.issn=0941-0643&rft.eissn=1433-3058&rft_id=info:doi/10.1007/s00521-021-06103-6&rft_dat=%3Cproquest_cross%3E2585228537%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c319t-3b8c7db23b0fdba8ec2f9dc2c08b3ac7ebd099d996a96df32532a930882194473%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2585228537&rft_id=info:pmid/&rfr_iscdi=true