Loading…

Metaheuristics should be tested on large benchmark set with various numbers of function evaluations

Numerical metaheuristics are often tested on mathematical problems collected into a benchmark set. There are many benchmark sets, but the number of problems in a particular benchmark rarely exceeds 30, sometimes is much lower. The stopping condition is frequently based on the maximum number of funct...

Full description

Saved in:
Bibliographic Details
Published in:Swarm and evolutionary computation 2025-02, Vol.92, Article 101807
Main Authors: Piotrowski, Adam P., Napiorkowski, Jaroslaw J., Piotrowska, Agnieszka E.
Format: Article
Language:English
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Numerical metaheuristics are often tested on mathematical problems collected into a benchmark set. There are many benchmark sets, but the number of problems in a particular benchmark rarely exceeds 30, sometimes is much lower. The stopping condition is frequently based on the maximum number of function evaluations, commonly set to a single value, somehow related to the problem's dimensionality. However, the ranking of algorithms may highly depend on the number of allowed function evaluations. As a result, by changing the number of function evaluations, different algorithms may be promoted as the best ones. In the present study, we suggest that metaheuristics should rather be tested using independently four different numbers of maximum function evaluations that differ by orders of magnitude (e.g., 5.000, 50.000, 500.000, and 5.000.000 function calls). We recommend performing tests on both higher- and lower-dimensional versions of 72 problems from three well-known benchmark sets (CEC 2014, CEC 2017, CEC 2022). The ranking of algorithms based on each particular computational budget should be discussed separately. This way, various algorithms may show their strengths in shorter or longer searches and weaknesses in other cases, encouraging more nuanced discussion. We also show that the number of benchmark problems does matter: results based on larger sets of problems are much more frequently statistically significant than results based on a single small benchmark set. There is also a difference in the percentage of statistically significant results between tests performed with lower and higher numbers of allowed function calls.
ISSN:2210-6502
DOI:10.1016/j.swevo.2024.101807