Loading…
Improved learning performance for small datasets in high dimensions by new dual-net model for non-linear interpolation virtual sample generation
The number of reliable samples obtained in early decision-making activity is usually relatively small. Due to variable distribution and incomplete structure of tiny datasets, it is challenging to create reliable and robust predictive modeling using classic statistical and machine learning models in...
Saved in:
Published in: | Decision Support Systems 2023-09, Vol.172, p.113996, Article 113996 |
---|---|
Main Authors: | , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | cdi_FETCH-LOGICAL-c297t-1ecd7b0943b8cacdbe8275dde32d19b56884764b16827a93314156071eaa8a413 |
---|---|
cites | cdi_FETCH-LOGICAL-c297t-1ecd7b0943b8cacdbe8275dde32d19b56884764b16827a93314156071eaa8a413 |
container_end_page | |
container_issue | |
container_start_page | 113996 |
container_title | Decision Support Systems |
container_volume | 172 |
creator | Lin, Liang-Sian Lin, Yao-San Li, Der-Chiang Liu, Yun-Hsuan |
description | The number of reliable samples obtained in early decision-making activity is usually relatively small. Due to variable distribution and incomplete structure of tiny datasets, it is challenging to create reliable and robust predictive modeling using classic statistical and machine learning models in small sample settings. The virtual sample generation (VSG) technique improves model learning accuracies for minimal datasets across diverse applications. Virtual samples on independent variables were generated using established VSG methods predicated on the assumption of a probability distribution or a membership function to fill data gaps. However, in the actual world, non-linear function interactions between variables are common. To address this issue, this paper developed a novel VSG method called Dual-VSG, which generates non-linear interpolation virtual samples using a self-supervised learning (SSL) framework to improve learning performance on small datasets. We generated non-linear interpolation virtual samples without labels by estimating non-linear functions and transforming them into a high-dimensional space using the proposed dual-net model. The weights of the dual-net model are transferred to a downstream task to generate virtual sample labels. To demonstrate the effectiveness of the suggested strategy, this research employed five datasets. On the Backpropagation Neural Networks (BPNN) predictive model, we compared the suggested method's prediction performance to two state-of-the-art VSG approaches. To assess prediction performance on a regression dataset, the Mean Absolute Percentage Error (MAPE) and the Root Mean Square Error (RMSE) are used. Furthermore, the classification accuracy (ACC) and the Fl measure are used to assess classification capability on classification datasets. In addition, the paired t-test was utilized to see if the suggested Dual-VSG approach differed significantly from the other VSG methods in terms of RMSE, MAPE, accuracy (ACC), or F1 score. For short datasets, the suggested Dual-VSG method outperforms those VSG methods, according to our experimental results.
[Display omitted]
•The small dataset problem is an important issue in enterprises and academia.•A new Dual-Net-VSG approach generates non-linear interpolation virtual samples.•The Dual-Net-VSG approach proposed follows a self-supervised learning framework.•The proposed method's efficacy is verified over three datasets.•Paired t-test elucidates the significance of differen |
doi_str_mv | 10.1016/j.dss.2023.113996 |
format | article |
fullrecord | <record><control><sourceid>elsevier_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1016_j_dss_2023_113996</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0167923623000714</els_id><sourcerecordid>S0167923623000714</sourcerecordid><originalsourceid>FETCH-LOGICAL-c297t-1ecd7b0943b8cacdbe8275dde32d19b56884764b16827a93314156071eaa8a413</originalsourceid><addsrcrecordid>eNp9kEtOwzAQhr0AiVI4ADtfIMGOUycRK1TxqFSJDawtx562rvyIbFPUW3BkTMua1Yz-mW80-hC6o6SmhPL7fa1TqhvSsJpSNgz8As1K3lVDw_gVuk5pTwhnXc9n6HvlphgOoLEFGb3xWzxB3ITopFeAS4OTk9ZiLbNMkBM2Hu_Mdoe1ceCTCT7h8Yg9fGH9KW3lIWMXNNgT64OvrPHldOEyxClYmQuDDybmso6TdJMFvAUP8TS5QZcbaRPc_tU5-nh-el--Vuu3l9XycV2pZuhyRUHpbiRDy8ZeSaVH6JtuoTWwRtNhXPC-bzvejpSXXA6M0ZYuOOkoSNnLlrI5oue7KoaUImzEFI2T8SgoEb8axV4UjeJXozhrLMzDmYHy2MFAFEkZKJ60iaCy0MH8Q_8Avv2A1g</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Improved learning performance for small datasets in high dimensions by new dual-net model for non-linear interpolation virtual sample generation</title><source>Elsevier</source><creator>Lin, Liang-Sian ; Lin, Yao-San ; Li, Der-Chiang ; Liu, Yun-Hsuan</creator><creatorcontrib>Lin, Liang-Sian ; Lin, Yao-San ; Li, Der-Chiang ; Liu, Yun-Hsuan</creatorcontrib><description>The number of reliable samples obtained in early decision-making activity is usually relatively small. Due to variable distribution and incomplete structure of tiny datasets, it is challenging to create reliable and robust predictive modeling using classic statistical and machine learning models in small sample settings. The virtual sample generation (VSG) technique improves model learning accuracies for minimal datasets across diverse applications. Virtual samples on independent variables were generated using established VSG methods predicated on the assumption of a probability distribution or a membership function to fill data gaps. However, in the actual world, non-linear function interactions between variables are common. To address this issue, this paper developed a novel VSG method called Dual-VSG, which generates non-linear interpolation virtual samples using a self-supervised learning (SSL) framework to improve learning performance on small datasets. We generated non-linear interpolation virtual samples without labels by estimating non-linear functions and transforming them into a high-dimensional space using the proposed dual-net model. The weights of the dual-net model are transferred to a downstream task to generate virtual sample labels. To demonstrate the effectiveness of the suggested strategy, this research employed five datasets. On the Backpropagation Neural Networks (BPNN) predictive model, we compared the suggested method's prediction performance to two state-of-the-art VSG approaches. To assess prediction performance on a regression dataset, the Mean Absolute Percentage Error (MAPE) and the Root Mean Square Error (RMSE) are used. Furthermore, the classification accuracy (ACC) and the Fl measure are used to assess classification capability on classification datasets. In addition, the paired t-test was utilized to see if the suggested Dual-VSG approach differed significantly from the other VSG methods in terms of RMSE, MAPE, accuracy (ACC), or F1 score. For short datasets, the suggested Dual-VSG method outperforms those VSG methods, according to our experimental results.
[Display omitted]
•The small dataset problem is an important issue in enterprises and academia.•A new Dual-Net-VSG approach generates non-linear interpolation virtual samples.•The Dual-Net-VSG approach proposed follows a self-supervised learning framework.•The proposed method's efficacy is verified over three datasets.•Paired t-test elucidates the significance of differences among four methods.f</description><identifier>ISSN: 0167-9236</identifier><identifier>DOI: 10.1016/j.dss.2023.113996</identifier><language>eng</language><publisher>Elsevier B.V</publisher><subject>Dual-net model ; Non-linear virtual samples ; Related interpolation points ; Small datasets</subject><ispartof>Decision Support Systems, 2023-09, Vol.172, p.113996, Article 113996</ispartof><rights>2023 Elsevier B.V.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c297t-1ecd7b0943b8cacdbe8275dde32d19b56884764b16827a93314156071eaa8a413</citedby><cites>FETCH-LOGICAL-c297t-1ecd7b0943b8cacdbe8275dde32d19b56884764b16827a93314156071eaa8a413</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids></links><search><creatorcontrib>Lin, Liang-Sian</creatorcontrib><creatorcontrib>Lin, Yao-San</creatorcontrib><creatorcontrib>Li, Der-Chiang</creatorcontrib><creatorcontrib>Liu, Yun-Hsuan</creatorcontrib><title>Improved learning performance for small datasets in high dimensions by new dual-net model for non-linear interpolation virtual sample generation</title><title>Decision Support Systems</title><description>The number of reliable samples obtained in early decision-making activity is usually relatively small. Due to variable distribution and incomplete structure of tiny datasets, it is challenging to create reliable and robust predictive modeling using classic statistical and machine learning models in small sample settings. The virtual sample generation (VSG) technique improves model learning accuracies for minimal datasets across diverse applications. Virtual samples on independent variables were generated using established VSG methods predicated on the assumption of a probability distribution or a membership function to fill data gaps. However, in the actual world, non-linear function interactions between variables are common. To address this issue, this paper developed a novel VSG method called Dual-VSG, which generates non-linear interpolation virtual samples using a self-supervised learning (SSL) framework to improve learning performance on small datasets. We generated non-linear interpolation virtual samples without labels by estimating non-linear functions and transforming them into a high-dimensional space using the proposed dual-net model. The weights of the dual-net model are transferred to a downstream task to generate virtual sample labels. To demonstrate the effectiveness of the suggested strategy, this research employed five datasets. On the Backpropagation Neural Networks (BPNN) predictive model, we compared the suggested method's prediction performance to two state-of-the-art VSG approaches. To assess prediction performance on a regression dataset, the Mean Absolute Percentage Error (MAPE) and the Root Mean Square Error (RMSE) are used. Furthermore, the classification accuracy (ACC) and the Fl measure are used to assess classification capability on classification datasets. In addition, the paired t-test was utilized to see if the suggested Dual-VSG approach differed significantly from the other VSG methods in terms of RMSE, MAPE, accuracy (ACC), or F1 score. For short datasets, the suggested Dual-VSG method outperforms those VSG methods, according to our experimental results.
[Display omitted]
•The small dataset problem is an important issue in enterprises and academia.•A new Dual-Net-VSG approach generates non-linear interpolation virtual samples.•The Dual-Net-VSG approach proposed follows a self-supervised learning framework.•The proposed method's efficacy is verified over three datasets.•Paired t-test elucidates the significance of differences among four methods.f</description><subject>Dual-net model</subject><subject>Non-linear virtual samples</subject><subject>Related interpolation points</subject><subject>Small datasets</subject><issn>0167-9236</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><recordid>eNp9kEtOwzAQhr0AiVI4ADtfIMGOUycRK1TxqFSJDawtx562rvyIbFPUW3BkTMua1Yz-mW80-hC6o6SmhPL7fa1TqhvSsJpSNgz8As1K3lVDw_gVuk5pTwhnXc9n6HvlphgOoLEFGb3xWzxB3ITopFeAS4OTk9ZiLbNMkBM2Hu_Mdoe1ceCTCT7h8Yg9fGH9KW3lIWMXNNgT64OvrPHldOEyxClYmQuDDybmso6TdJMFvAUP8TS5QZcbaRPc_tU5-nh-el--Vuu3l9XycV2pZuhyRUHpbiRDy8ZeSaVH6JtuoTWwRtNhXPC-bzvejpSXXA6M0ZYuOOkoSNnLlrI5oue7KoaUImzEFI2T8SgoEb8axV4UjeJXozhrLMzDmYHy2MFAFEkZKJ60iaCy0MH8Q_8Avv2A1g</recordid><startdate>202309</startdate><enddate>202309</enddate><creator>Lin, Liang-Sian</creator><creator>Lin, Yao-San</creator><creator>Li, Der-Chiang</creator><creator>Liu, Yun-Hsuan</creator><general>Elsevier B.V</general><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>202309</creationdate><title>Improved learning performance for small datasets in high dimensions by new dual-net model for non-linear interpolation virtual sample generation</title><author>Lin, Liang-Sian ; Lin, Yao-San ; Li, Der-Chiang ; Liu, Yun-Hsuan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c297t-1ecd7b0943b8cacdbe8275dde32d19b56884764b16827a93314156071eaa8a413</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Dual-net model</topic><topic>Non-linear virtual samples</topic><topic>Related interpolation points</topic><topic>Small datasets</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Lin, Liang-Sian</creatorcontrib><creatorcontrib>Lin, Yao-San</creatorcontrib><creatorcontrib>Li, Der-Chiang</creatorcontrib><creatorcontrib>Liu, Yun-Hsuan</creatorcontrib><collection>CrossRef</collection><jtitle>Decision Support Systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Lin, Liang-Sian</au><au>Lin, Yao-San</au><au>Li, Der-Chiang</au><au>Liu, Yun-Hsuan</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Improved learning performance for small datasets in high dimensions by new dual-net model for non-linear interpolation virtual sample generation</atitle><jtitle>Decision Support Systems</jtitle><date>2023-09</date><risdate>2023</risdate><volume>172</volume><spage>113996</spage><pages>113996-</pages><artnum>113996</artnum><issn>0167-9236</issn><abstract>The number of reliable samples obtained in early decision-making activity is usually relatively small. Due to variable distribution and incomplete structure of tiny datasets, it is challenging to create reliable and robust predictive modeling using classic statistical and machine learning models in small sample settings. The virtual sample generation (VSG) technique improves model learning accuracies for minimal datasets across diverse applications. Virtual samples on independent variables were generated using established VSG methods predicated on the assumption of a probability distribution or a membership function to fill data gaps. However, in the actual world, non-linear function interactions between variables are common. To address this issue, this paper developed a novel VSG method called Dual-VSG, which generates non-linear interpolation virtual samples using a self-supervised learning (SSL) framework to improve learning performance on small datasets. We generated non-linear interpolation virtual samples without labels by estimating non-linear functions and transforming them into a high-dimensional space using the proposed dual-net model. The weights of the dual-net model are transferred to a downstream task to generate virtual sample labels. To demonstrate the effectiveness of the suggested strategy, this research employed five datasets. On the Backpropagation Neural Networks (BPNN) predictive model, we compared the suggested method's prediction performance to two state-of-the-art VSG approaches. To assess prediction performance on a regression dataset, the Mean Absolute Percentage Error (MAPE) and the Root Mean Square Error (RMSE) are used. Furthermore, the classification accuracy (ACC) and the Fl measure are used to assess classification capability on classification datasets. In addition, the paired t-test was utilized to see if the suggested Dual-VSG approach differed significantly from the other VSG methods in terms of RMSE, MAPE, accuracy (ACC), or F1 score. For short datasets, the suggested Dual-VSG method outperforms those VSG methods, according to our experimental results.
[Display omitted]
•The small dataset problem is an important issue in enterprises and academia.•A new Dual-Net-VSG approach generates non-linear interpolation virtual samples.•The Dual-Net-VSG approach proposed follows a self-supervised learning framework.•The proposed method's efficacy is verified over three datasets.•Paired t-test elucidates the significance of differences among four methods.f</abstract><pub>Elsevier B.V</pub><doi>10.1016/j.dss.2023.113996</doi></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0167-9236 |
ispartof | Decision Support Systems, 2023-09, Vol.172, p.113996, Article 113996 |
issn | 0167-9236 |
language | eng |
recordid | cdi_crossref_primary_10_1016_j_dss_2023_113996 |
source | Elsevier |
subjects | Dual-net model Non-linear virtual samples Related interpolation points Small datasets |
title | Improved learning performance for small datasets in high dimensions by new dual-net model for non-linear interpolation virtual sample generation |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-07T20%3A39%3A52IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-elsevier_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Improved%20learning%20performance%20for%20small%20datasets%20in%20high%20dimensions%20by%20new%20dual-net%20model%20for%20non-linear%20interpolation%20virtual%20sample%20generation&rft.jtitle=Decision%20Support%20Systems&rft.au=Lin,%20Liang-Sian&rft.date=2023-09&rft.volume=172&rft.spage=113996&rft.pages=113996-&rft.artnum=113996&rft.issn=0167-9236&rft_id=info:doi/10.1016/j.dss.2023.113996&rft_dat=%3Celsevier_cross%3ES0167923623000714%3C/elsevier_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c297t-1ecd7b0943b8cacdbe8275dde32d19b56884764b16827a93314156071eaa8a413%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |