Loading…

Improved learning performance for small datasets in high dimensions by new dual-net model for non-linear interpolation virtual sample generation

The number of reliable samples obtained in early decision-making activity is usually relatively small. Due to variable distribution and incomplete structure of tiny datasets, it is challenging to create reliable and robust predictive modeling using classic statistical and machine learning models in...

Full description

Saved in:
Bibliographic Details
Published in:Decision Support Systems 2023-09, Vol.172, p.113996, Article 113996
Main Authors: Lin, Liang-Sian, Lin, Yao-San, Li, Der-Chiang, Liu, Yun-Hsuan
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The number of reliable samples obtained in early decision-making activity is usually relatively small. Due to variable distribution and incomplete structure of tiny datasets, it is challenging to create reliable and robust predictive modeling using classic statistical and machine learning models in small sample settings. The virtual sample generation (VSG) technique improves model learning accuracies for minimal datasets across diverse applications. Virtual samples on independent variables were generated using established VSG methods predicated on the assumption of a probability distribution or a membership function to fill data gaps. However, in the actual world, non-linear function interactions between variables are common. To address this issue, this paper developed a novel VSG method called Dual-VSG, which generates non-linear interpolation virtual samples using a self-supervised learning (SSL) framework to improve learning performance on small datasets. We generated non-linear interpolation virtual samples without labels by estimating non-linear functions and transforming them into a high-dimensional space using the proposed dual-net model. The weights of the dual-net model are transferred to a downstream task to generate virtual sample labels. To demonstrate the effectiveness of the suggested strategy, this research employed five datasets. On the Backpropagation Neural Networks (BPNN) predictive model, we compared the suggested method's prediction performance to two state-of-the-art VSG approaches. To assess prediction performance on a regression dataset, the Mean Absolute Percentage Error (MAPE) and the Root Mean Square Error (RMSE) are used. Furthermore, the classification accuracy (ACC) and the Fl measure are used to assess classification capability on classification datasets. In addition, the paired t-test was utilized to see if the suggested Dual-VSG approach differed significantly from the other VSG methods in terms of RMSE, MAPE, accuracy (ACC), or F1 score. For short datasets, the suggested Dual-VSG method outperforms those VSG methods, according to our experimental results. [Display omitted] •The small dataset problem is an important issue in enterprises and academia.•A new Dual-Net-VSG approach generates non-linear interpolation virtual samples.•The Dual-Net-VSG approach proposed follows a self-supervised learning framework.•The proposed method's efficacy is verified over three datasets.•Paired t-test elucidates the significance of differen
ISSN:0167-9236
DOI:10.1016/j.dss.2023.113996