Loading…

Improved learning performance for small datasets in high dimensions by new dual-net model for non-linear interpolation virtual sample generation

The number of reliable samples obtained in early decision-making activity is usually relatively small. Due to variable distribution and incomplete structure of tiny datasets, it is challenging to create reliable and robust predictive modeling using classic statistical and machine learning models in...

Full description

Saved in:
Bibliographic Details
Published in:Decision Support Systems 2023-09, Vol.172, p.113996, Article 113996
Main Authors: Lin, Liang-Sian, Lin, Yao-San, Li, Der-Chiang, Liu, Yun-Hsuan
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c297t-1ecd7b0943b8cacdbe8275dde32d19b56884764b16827a93314156071eaa8a413
cites cdi_FETCH-LOGICAL-c297t-1ecd7b0943b8cacdbe8275dde32d19b56884764b16827a93314156071eaa8a413
container_end_page
container_issue
container_start_page 113996
container_title Decision Support Systems
container_volume 172
creator Lin, Liang-Sian
Lin, Yao-San
Li, Der-Chiang
Liu, Yun-Hsuan
description The number of reliable samples obtained in early decision-making activity is usually relatively small. Due to variable distribution and incomplete structure of tiny datasets, it is challenging to create reliable and robust predictive modeling using classic statistical and machine learning models in small sample settings. The virtual sample generation (VSG) technique improves model learning accuracies for minimal datasets across diverse applications. Virtual samples on independent variables were generated using established VSG methods predicated on the assumption of a probability distribution or a membership function to fill data gaps. However, in the actual world, non-linear function interactions between variables are common. To address this issue, this paper developed a novel VSG method called Dual-VSG, which generates non-linear interpolation virtual samples using a self-supervised learning (SSL) framework to improve learning performance on small datasets. We generated non-linear interpolation virtual samples without labels by estimating non-linear functions and transforming them into a high-dimensional space using the proposed dual-net model. The weights of the dual-net model are transferred to a downstream task to generate virtual sample labels. To demonstrate the effectiveness of the suggested strategy, this research employed five datasets. On the Backpropagation Neural Networks (BPNN) predictive model, we compared the suggested method's prediction performance to two state-of-the-art VSG approaches. To assess prediction performance on a regression dataset, the Mean Absolute Percentage Error (MAPE) and the Root Mean Square Error (RMSE) are used. Furthermore, the classification accuracy (ACC) and the Fl measure are used to assess classification capability on classification datasets. In addition, the paired t-test was utilized to see if the suggested Dual-VSG approach differed significantly from the other VSG methods in terms of RMSE, MAPE, accuracy (ACC), or F1 score. For short datasets, the suggested Dual-VSG method outperforms those VSG methods, according to our experimental results. [Display omitted] •The small dataset problem is an important issue in enterprises and academia.•A new Dual-Net-VSG approach generates non-linear interpolation virtual samples.•The Dual-Net-VSG approach proposed follows a self-supervised learning framework.•The proposed method's efficacy is verified over three datasets.•Paired t-test elucidates the significance of differen
doi_str_mv 10.1016/j.dss.2023.113996
format article
fullrecord <record><control><sourceid>elsevier_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1016_j_dss_2023_113996</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0167923623000714</els_id><sourcerecordid>S0167923623000714</sourcerecordid><originalsourceid>FETCH-LOGICAL-c297t-1ecd7b0943b8cacdbe8275dde32d19b56884764b16827a93314156071eaa8a413</originalsourceid><addsrcrecordid>eNp9kEtOwzAQhr0AiVI4ADtfIMGOUycRK1TxqFSJDawtx562rvyIbFPUW3BkTMua1Yz-mW80-hC6o6SmhPL7fa1TqhvSsJpSNgz8As1K3lVDw_gVuk5pTwhnXc9n6HvlphgOoLEFGb3xWzxB3ITopFeAS4OTk9ZiLbNMkBM2Hu_Mdoe1ceCTCT7h8Yg9fGH9KW3lIWMXNNgT64OvrPHldOEyxClYmQuDDybmso6TdJMFvAUP8TS5QZcbaRPc_tU5-nh-el--Vuu3l9XycV2pZuhyRUHpbiRDy8ZeSaVH6JtuoTWwRtNhXPC-bzvejpSXXA6M0ZYuOOkoSNnLlrI5oue7KoaUImzEFI2T8SgoEb8axV4UjeJXozhrLMzDmYHy2MFAFEkZKJ60iaCy0MH8Q_8Avv2A1g</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Improved learning performance for small datasets in high dimensions by new dual-net model for non-linear interpolation virtual sample generation</title><source>Elsevier</source><creator>Lin, Liang-Sian ; Lin, Yao-San ; Li, Der-Chiang ; Liu, Yun-Hsuan</creator><creatorcontrib>Lin, Liang-Sian ; Lin, Yao-San ; Li, Der-Chiang ; Liu, Yun-Hsuan</creatorcontrib><description>The number of reliable samples obtained in early decision-making activity is usually relatively small. Due to variable distribution and incomplete structure of tiny datasets, it is challenging to create reliable and robust predictive modeling using classic statistical and machine learning models in small sample settings. The virtual sample generation (VSG) technique improves model learning accuracies for minimal datasets across diverse applications. Virtual samples on independent variables were generated using established VSG methods predicated on the assumption of a probability distribution or a membership function to fill data gaps. However, in the actual world, non-linear function interactions between variables are common. To address this issue, this paper developed a novel VSG method called Dual-VSG, which generates non-linear interpolation virtual samples using a self-supervised learning (SSL) framework to improve learning performance on small datasets. We generated non-linear interpolation virtual samples without labels by estimating non-linear functions and transforming them into a high-dimensional space using the proposed dual-net model. The weights of the dual-net model are transferred to a downstream task to generate virtual sample labels. To demonstrate the effectiveness of the suggested strategy, this research employed five datasets. On the Backpropagation Neural Networks (BPNN) predictive model, we compared the suggested method's prediction performance to two state-of-the-art VSG approaches. To assess prediction performance on a regression dataset, the Mean Absolute Percentage Error (MAPE) and the Root Mean Square Error (RMSE) are used. Furthermore, the classification accuracy (ACC) and the Fl measure are used to assess classification capability on classification datasets. In addition, the paired t-test was utilized to see if the suggested Dual-VSG approach differed significantly from the other VSG methods in terms of RMSE, MAPE, accuracy (ACC), or F1 score. For short datasets, the suggested Dual-VSG method outperforms those VSG methods, according to our experimental results. [Display omitted] •The small dataset problem is an important issue in enterprises and academia.•A new Dual-Net-VSG approach generates non-linear interpolation virtual samples.•The Dual-Net-VSG approach proposed follows a self-supervised learning framework.•The proposed method's efficacy is verified over three datasets.•Paired t-test elucidates the significance of differences among four methods.f</description><identifier>ISSN: 0167-9236</identifier><identifier>DOI: 10.1016/j.dss.2023.113996</identifier><language>eng</language><publisher>Elsevier B.V</publisher><subject>Dual-net model ; Non-linear virtual samples ; Related interpolation points ; Small datasets</subject><ispartof>Decision Support Systems, 2023-09, Vol.172, p.113996, Article 113996</ispartof><rights>2023 Elsevier B.V.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c297t-1ecd7b0943b8cacdbe8275dde32d19b56884764b16827a93314156071eaa8a413</citedby><cites>FETCH-LOGICAL-c297t-1ecd7b0943b8cacdbe8275dde32d19b56884764b16827a93314156071eaa8a413</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids></links><search><creatorcontrib>Lin, Liang-Sian</creatorcontrib><creatorcontrib>Lin, Yao-San</creatorcontrib><creatorcontrib>Li, Der-Chiang</creatorcontrib><creatorcontrib>Liu, Yun-Hsuan</creatorcontrib><title>Improved learning performance for small datasets in high dimensions by new dual-net model for non-linear interpolation virtual sample generation</title><title>Decision Support Systems</title><description>The number of reliable samples obtained in early decision-making activity is usually relatively small. Due to variable distribution and incomplete structure of tiny datasets, it is challenging to create reliable and robust predictive modeling using classic statistical and machine learning models in small sample settings. The virtual sample generation (VSG) technique improves model learning accuracies for minimal datasets across diverse applications. Virtual samples on independent variables were generated using established VSG methods predicated on the assumption of a probability distribution or a membership function to fill data gaps. However, in the actual world, non-linear function interactions between variables are common. To address this issue, this paper developed a novel VSG method called Dual-VSG, which generates non-linear interpolation virtual samples using a self-supervised learning (SSL) framework to improve learning performance on small datasets. We generated non-linear interpolation virtual samples without labels by estimating non-linear functions and transforming them into a high-dimensional space using the proposed dual-net model. The weights of the dual-net model are transferred to a downstream task to generate virtual sample labels. To demonstrate the effectiveness of the suggested strategy, this research employed five datasets. On the Backpropagation Neural Networks (BPNN) predictive model, we compared the suggested method's prediction performance to two state-of-the-art VSG approaches. To assess prediction performance on a regression dataset, the Mean Absolute Percentage Error (MAPE) and the Root Mean Square Error (RMSE) are used. Furthermore, the classification accuracy (ACC) and the Fl measure are used to assess classification capability on classification datasets. In addition, the paired t-test was utilized to see if the suggested Dual-VSG approach differed significantly from the other VSG methods in terms of RMSE, MAPE, accuracy (ACC), or F1 score. For short datasets, the suggested Dual-VSG method outperforms those VSG methods, according to our experimental results. [Display omitted] •The small dataset problem is an important issue in enterprises and academia.•A new Dual-Net-VSG approach generates non-linear interpolation virtual samples.•The Dual-Net-VSG approach proposed follows a self-supervised learning framework.•The proposed method's efficacy is verified over three datasets.•Paired t-test elucidates the significance of differences among four methods.f</description><subject>Dual-net model</subject><subject>Non-linear virtual samples</subject><subject>Related interpolation points</subject><subject>Small datasets</subject><issn>0167-9236</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><recordid>eNp9kEtOwzAQhr0AiVI4ADtfIMGOUycRK1TxqFSJDawtx562rvyIbFPUW3BkTMua1Yz-mW80-hC6o6SmhPL7fa1TqhvSsJpSNgz8As1K3lVDw_gVuk5pTwhnXc9n6HvlphgOoLEFGb3xWzxB3ITopFeAS4OTk9ZiLbNMkBM2Hu_Mdoe1ceCTCT7h8Yg9fGH9KW3lIWMXNNgT64OvrPHldOEyxClYmQuDDybmso6TdJMFvAUP8TS5QZcbaRPc_tU5-nh-el--Vuu3l9XycV2pZuhyRUHpbiRDy8ZeSaVH6JtuoTWwRtNhXPC-bzvejpSXXA6M0ZYuOOkoSNnLlrI5oue7KoaUImzEFI2T8SgoEb8axV4UjeJXozhrLMzDmYHy2MFAFEkZKJ60iaCy0MH8Q_8Avv2A1g</recordid><startdate>202309</startdate><enddate>202309</enddate><creator>Lin, Liang-Sian</creator><creator>Lin, Yao-San</creator><creator>Li, Der-Chiang</creator><creator>Liu, Yun-Hsuan</creator><general>Elsevier B.V</general><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>202309</creationdate><title>Improved learning performance for small datasets in high dimensions by new dual-net model for non-linear interpolation virtual sample generation</title><author>Lin, Liang-Sian ; Lin, Yao-San ; Li, Der-Chiang ; Liu, Yun-Hsuan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c297t-1ecd7b0943b8cacdbe8275dde32d19b56884764b16827a93314156071eaa8a413</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Dual-net model</topic><topic>Non-linear virtual samples</topic><topic>Related interpolation points</topic><topic>Small datasets</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Lin, Liang-Sian</creatorcontrib><creatorcontrib>Lin, Yao-San</creatorcontrib><creatorcontrib>Li, Der-Chiang</creatorcontrib><creatorcontrib>Liu, Yun-Hsuan</creatorcontrib><collection>CrossRef</collection><jtitle>Decision Support Systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Lin, Liang-Sian</au><au>Lin, Yao-San</au><au>Li, Der-Chiang</au><au>Liu, Yun-Hsuan</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Improved learning performance for small datasets in high dimensions by new dual-net model for non-linear interpolation virtual sample generation</atitle><jtitle>Decision Support Systems</jtitle><date>2023-09</date><risdate>2023</risdate><volume>172</volume><spage>113996</spage><pages>113996-</pages><artnum>113996</artnum><issn>0167-9236</issn><abstract>The number of reliable samples obtained in early decision-making activity is usually relatively small. Due to variable distribution and incomplete structure of tiny datasets, it is challenging to create reliable and robust predictive modeling using classic statistical and machine learning models in small sample settings. The virtual sample generation (VSG) technique improves model learning accuracies for minimal datasets across diverse applications. Virtual samples on independent variables were generated using established VSG methods predicated on the assumption of a probability distribution or a membership function to fill data gaps. However, in the actual world, non-linear function interactions between variables are common. To address this issue, this paper developed a novel VSG method called Dual-VSG, which generates non-linear interpolation virtual samples using a self-supervised learning (SSL) framework to improve learning performance on small datasets. We generated non-linear interpolation virtual samples without labels by estimating non-linear functions and transforming them into a high-dimensional space using the proposed dual-net model. The weights of the dual-net model are transferred to a downstream task to generate virtual sample labels. To demonstrate the effectiveness of the suggested strategy, this research employed five datasets. On the Backpropagation Neural Networks (BPNN) predictive model, we compared the suggested method's prediction performance to two state-of-the-art VSG approaches. To assess prediction performance on a regression dataset, the Mean Absolute Percentage Error (MAPE) and the Root Mean Square Error (RMSE) are used. Furthermore, the classification accuracy (ACC) and the Fl measure are used to assess classification capability on classification datasets. In addition, the paired t-test was utilized to see if the suggested Dual-VSG approach differed significantly from the other VSG methods in terms of RMSE, MAPE, accuracy (ACC), or F1 score. For short datasets, the suggested Dual-VSG method outperforms those VSG methods, according to our experimental results. [Display omitted] •The small dataset problem is an important issue in enterprises and academia.•A new Dual-Net-VSG approach generates non-linear interpolation virtual samples.•The Dual-Net-VSG approach proposed follows a self-supervised learning framework.•The proposed method's efficacy is verified over three datasets.•Paired t-test elucidates the significance of differences among four methods.f</abstract><pub>Elsevier B.V</pub><doi>10.1016/j.dss.2023.113996</doi></addata></record>
fulltext fulltext
identifier ISSN: 0167-9236
ispartof Decision Support Systems, 2023-09, Vol.172, p.113996, Article 113996
issn 0167-9236
language eng
recordid cdi_crossref_primary_10_1016_j_dss_2023_113996
source Elsevier
subjects Dual-net model
Non-linear virtual samples
Related interpolation points
Small datasets
title Improved learning performance for small datasets in high dimensions by new dual-net model for non-linear interpolation virtual sample generation
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-07T20%3A39%3A52IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-elsevier_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Improved%20learning%20performance%20for%20small%20datasets%20in%20high%20dimensions%20by%20new%20dual-net%20model%20for%20non-linear%20interpolation%20virtual%20sample%20generation&rft.jtitle=Decision%20Support%20Systems&rft.au=Lin,%20Liang-Sian&rft.date=2023-09&rft.volume=172&rft.spage=113996&rft.pages=113996-&rft.artnum=113996&rft.issn=0167-9236&rft_id=info:doi/10.1016/j.dss.2023.113996&rft_dat=%3Celsevier_cross%3ES0167923623000714%3C/elsevier_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c297t-1ecd7b0943b8cacdbe8275dde32d19b56884764b16827a93314156071eaa8a413%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true