Loading…

DynaQ: online learning from imbalanced multi-class streams through dynamic sampling

Online supervised learning from fast-evolving data streams, particularly in domains such as health, the environment, and manufacturing, is a crucial research area. However, these domains often experience class imbalance, which can skew class distributions. It is essential for online learning algorit...

Full description

Saved in:

Bibliographic Details
Published in:	Applied intelligence (Dordrecht, Netherlands) Netherlands), 2023-11, Vol.53 (21), p.24908-24930
Main Authors:	Sadeghi, Farnaz, Viktor, Herna L., Vafaie, Parsa
Format:	Article
Language:	English
Subjects:	Algorithms Artificial Intelligence Computer Science Data transmission Distance learning Euclidean geometry Evolution Machine learning Machines Manufacturing Mechanical Engineering Processes Queues Resampling Skewed distributions Supervised learning
Citations:	Items that this one cites
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Online supervised learning from fast-evolving data streams, particularly in domains such as health, the environment, and manufacturing, is a crucial research area. However, these domains often experience class imbalance, which can skew class distributions. It is essential for online learning algorithms to analyze large datasets in real-time while accurately modeling rare or infrequent classes that may appear in bursts. While methods have been proposed to handle binary class imbalance, there is a lack of attention to multi-class imbalanced settings with varying degrees of imbalance in evolving streams. In this paper, we present the Dynamic Queues (DynaQ) algorithm for online learning in multi-class imbalanced settings to fill this knowledge gap. Our approach utilizes a batch-based resampling method that creates an instance queue for each class to balance the number of instances. We maintain a queue threshold and remove older samples during training. Additionally, we dynamically oversample minority classes based on one of four rate parameters: recall, F1-score, κ m , and Euclidean distance. Our learning algorithm consists of an ensemble that uses sliding windows and a soft voting schema while incorporating a drift detection mechanism. Our experimental results demonstrate the superiority of the DynaQ approach over state-of-the-art methods.
ISSN:	0924-669X 1573-7497
DOI:	10.1007/s10489-023-04886-w