Loading…
VFC-SMOTE: very fast continuous synthetic minority oversampling for evolving data streams
The world is constantly changing, and so are the massive amount of data produced. However, only a few studies deal with online class imbalance learning that combines the challenges of class-imbalanced data streams and concept drift. In this paper, we propose the very fast continuous synthetic minori...
Saved in:
Published in: | Data mining and knowledge discovery 2021-11, Vol.35 (6), p.2679-2713 |
---|---|
Main Authors: | , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | The world is constantly changing, and so are the massive amount of data produced. However, only a few studies deal with online class imbalance learning that combines the challenges of class-imbalanced data streams and concept drift. In this paper, we propose the very fast continuous synthetic minority oversampling technique (
VFC
-
SMOTE
). It is a novel meta-strategy to be prepended to any streaming machine learning classification algorithm aiming at oversampling the minority class using a new version of
Smote
and
Borderline
-
Smote
inspired by Data Sketching. We benchmarked
VFC
-
SMOTE
pipelines on synthetic and real data streams containing different concept drifts, imbalance levels, and class distributions. We bring statistical evidence that
VFC
-
SMOTE
pipelines learn models whose minority class performances are better than state-of-the-art. Moreover, we analyze the time/memory consumption and the concept drift recovery speed. |
---|---|
ISSN: | 1384-5810 1573-756X |
DOI: | 10.1007/s10618-021-00786-0 |