Loading…

Novel Meta-Features for Automated Machine Learning Model Selection in Anomaly Detection

A growing number of research papers shed light on automated machine learning (AutoML) frameworks, which are becoming a promising solution for building complex machine learning models without human expertise and assistance. The key challenge in enabling AutoML frameworks to build an efficient model f...

Full description

Saved in:
Bibliographic Details
Published in:IEEE access 2021, Vol.9, p.89675-89687
Main Authors: Kotlar, Milos, Punt, Marija, Radivojevic, Zaharije, Cvetanovic, Milos, Milutinovic, Veljko
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:A growing number of research papers shed light on automated machine learning (AutoML) frameworks, which are becoming a promising solution for building complex machine learning models without human expertise and assistance. The key challenge in enabling AutoML frameworks to build an efficient model for anomaly detection tasks is to determine the best underlying model for a given task and optimization metric. The meta-learning approaches based on a set of meta-features that describes data properties can enable efficient model selection in AutoML frameworks. The existing meta-learning approaches based on statistical and information-theoretic meta-features require large amounts of data and computational resources to extract data properties. This paper proposes a novel set of meta-features for model selection in anomaly detection tasks based on domain-specific properties of data which overcomes the shortcomings of existing meta-features by introducing simple but effective meta-features that can be efficiently extracted or estimated by using a low amount of data. Experiments with 63 datasets from different repositories with varying schemas show that the proposed set of meta-features achieves an accuracy of 87% for model selection, while the achieved accuracy for simple meta-features is 74%, for statistical meta-features 68%, for information theory meta-feature 70%, and for a comprehensive set of meta-features by pyMFE 73%. This demonstrates that the proposed set can be adopted by AutoML frameworks across a diverse range of domains.
ISSN:2169-3536
2169-3536
DOI:10.1109/ACCESS.2021.3090936