Loading…

Correlation-based methods for representative fairness metric selection: An empirical study on efficiency and caveats in model evaluation

Addressing bias and unfairness in machine learning models in different application domains is a multifaceted challenge. Despite the variety of fairness metrics available, identifying an optimal set for evaluating a model’s fairness is still an open question due to the diverse nature of these metrics...

Full description

Saved in:
Bibliographic Details
Published in:Expert systems with applications 2025-04, Vol.268, p.126344, Article 126344
Main Authors: Loureiro, Rafael B., Pagano, Tiago P., Lisboa, Fernanda V.N., Nascimento, Lian F.S., Oliveira, Ewerton L.S., Winkler, Ingrid, Nascimento, Erick G. Sperandio
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Addressing bias and unfairness in machine learning models in different application domains is a multifaceted challenge. Despite the variety of fairness metrics available, identifying an optimal set for evaluating a model’s fairness is still an open question due to the diverse nature of these metrics and the lack of a comprehensive approach for ensuring fairness across different applications. This study aims to propose a method that allows the selection of the most representative metrics for bias and fairness assessment, in post-processing for machine learning models in different contexts. We delve into the use of a correlation-based strategy as a heuristic for fairness metric selection, applying bootstrap sampling using the Markov chain Monte Carlo technique, with our proposed improvements, including stratified sampling, stopping criterion, and Kendall correlation, to address the data bias representation, the computational cost, and the robustness, respectively. An average decrease of 64.37% in the number of models and of 20.00% in processing time was achieved. Moreover, the proposed method effectively paired metrics with similar behaviour, highlighting the presence of a similar term as a strong indicator of a direct relationship. While no standout metric emerges across all contexts, within specific models or datasets, certain metrics consistently stand out. In a complex scenario using a large language model for sexism detection, the proposed method achieved a 71.93% reduction in execution time while forming more comprehensive metric groups. Overall, the proposed method successfully selects the representative metric with a considerable gain in computational costs, demonstrating its practicality for real-world applications. [Display omitted] •Validate correlation for finding a representative fairness metric in a context.•Reduce the computational power needed to find the representative metrics.•Validate the method empirically, showing effectiveness across models and datasets.•Analyse fairness metrics context relations and dependencies.
ISSN:0957-4174
DOI:10.1016/j.eswa.2024.126344