Loading…

VFC-SMOTE: very fast continuous synthetic minority oversampling for evolving data streams

The world is constantly changing, and so are the massive amount of data produced. However, only a few studies deal with online class imbalance learning that combines the challenges of class-imbalanced data streams and concept drift. In this paper, we propose the very fast continuous synthetic minori...

Full description

Saved in:
Bibliographic Details
Published in:Data mining and knowledge discovery 2021-11, Vol.35 (6), p.2679-2713
Main Authors: Bernardo, Alessio, Della Valle, Emanuele
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The world is constantly changing, and so are the massive amount of data produced. However, only a few studies deal with online class imbalance learning that combines the challenges of class-imbalanced data streams and concept drift. In this paper, we propose the very fast continuous synthetic minority oversampling technique ( VFC - SMOTE ). It is a novel meta-strategy to be prepended to any streaming machine learning classification algorithm aiming at oversampling the minority class using a new version of Smote and Borderline - Smote inspired by Data Sketching. We benchmarked VFC - SMOTE pipelines on synthetic and real data streams containing different concept drifts, imbalance levels, and class distributions. We bring statistical evidence that VFC - SMOTE pipelines learn models whose minority class performances are better than state-of-the-art. Moreover, we analyze the time/memory consumption and the concept drift recovery speed.
ISSN:1384-5810
1573-756X
DOI:10.1007/s10618-021-00786-0