Loading…
Power‐Law‐Based Synthetic Minority Oversampling Technique on Imbalanced Serum Surface‐Enhanced Raman Spectroscopy Data for Cancer Screening
Surface‐enhanced Raman spectroscopy (SERS) has shown highly promising for existing cancer screening. However, previous “proof‐of‐concept” studies ignored the natural imbalance of cancer types in the population, leading the model to be biased toward learning more features in majority class during the...
Saved in:
Published in: | Advanced intelligent systems 2023-07, Vol.5 (7), p.n/a |
---|---|
Main Authors: | , , , , , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Surface‐enhanced Raman spectroscopy (SERS) has shown highly promising for existing cancer screening. However, previous “proof‐of‐concept” studies ignored the natural imbalance of cancer types in the population, leading the model to be biased toward learning more features in majority class during the learning process at the expense of ignoring minority class. Herein, a power‐law‐based synthetic minority oversampling technique (PL‐SMOTE) method is proposed to guide the resampling of multiclass serum SERS data by analyzing the long‐tailed (power‐law) distribution of cancer prevalence in the population. The proposed PL‐SMOTE method balances the number of minorities to resample and the number of overlaps between classes by introducing modulating factor. Modeling on resampled datasets synthesized by PL‐SMOTE verifies the effectiveness of proposed PL‐SMOTE method. After further fine‐tuning, the parameters of the deep neural network model and PL‐SMOTE method, an optimal cancer screening model with an optimal macroaveraged Recall score of 97.24% and an optimal macroaveraged F2‐Score of 97.38% is obtained. A new method for multiclass imbalanced resampling is provided, which has significant improvement on model performance in terms of SERS cancer screening. The method also inspires in other multiclass imbalanced scenario, such as biological medicine, abnormal detection, and disaster prediction.
Traditional full resampling strategies for multiclassification involve synthesizing redundant samples, resulting in degraded modeling. Herein, a power‐law‐based synthetic minority oversampling technique method is proposed to guide the resampling of multiclass data by analyzing the long‐tailed distribution of cancer prevalence in the population, which has significant improvement on model performance in terms of surface‐enhanced Raman spectroscopy (SERS)‐based cancer screening. |
---|---|
ISSN: | 2640-4567 2640-4567 |
DOI: | 10.1002/aisy.202300006 |