Loading…

Time-series clustering approach for training data selection of a data-driven predictive model: Application to an industrial bio 2,3-butanediol distillation process

•The training data selection method using time-series clustering is proposed.•The proposed method is applied to commercial 2,3-BDO distillation process.•The number and ratio of training data are optimized by mathematical model. In this study, we propose a time-series clustering approach that selects...

Full description

Saved in:

Bibliographic Details
Published in:	Computers & chemical engineering 2022-05, Vol.161, p.107758, Article 107758
Main Authors:	Choi, Yeongryeol, An, Nahyeon, Hong, Seokyoung, Cho, Hyungtae, Lim, Jongkoo, Han, In-Su, Moon, Il, Kim, Junghwan
Format:	Article
Language:	English
Subjects:	Bio 2,3-BDO Data-driven predictive model Time-series clustering Training data selection
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by	cdi_FETCH-LOGICAL-c372t-8b715b67e291c7c24c33c2c380187f7df7b4d6f6a1da5d0066e80c00183904693
cites	cdi_FETCH-LOGICAL-c372t-8b715b67e291c7c24c33c2c380187f7df7b4d6f6a1da5d0066e80c00183904693
container_end_page
container_issue
container_start_page	107758
container_title	Computers & chemical engineering
container_volume	161
creator	Choi, Yeongryeol An, Nahyeon Hong, Seokyoung Cho, Hyungtae Lim, Jongkoo Han, In-Su Moon, Il Kim, Junghwan
description	•The training data selection method using time-series clustering is proposed.•The proposed method is applied to commercial 2,3-BDO distillation process.•The number and ratio of training data are optimized by mathematical model. In this study, we propose a time-series clustering approach that selects optimal training data for the development of predictive models. The optimal number of clusters was set based on the variation of within-cluster sums of squares. A predictive model was developed with the selection ratio of training data from each of those clusters. Based on the results, a regression model was developed to predict the performance of the model. The search space was applied to the regression model, and the optimal training data ratio were selected satisfying the objective function and constraints. The effectiveness of the method is demonstrated by addressing a commercial bio 2,3-butanediol distillation process. As a result, the number of data for model training was reduced by 49.20% compared to the base case without clustering. The coefficient of determination (R2) showed the same level of performance, and the root-mean-square error was improved up to 14.07%.
doi_str_mv	10.1016/j.compchemeng.2022.107758
format	article
fullrecord	<record><control><sourceid>elsevier_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1016_j_compchemeng_2022_107758</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0098135422000990</els_id><sourcerecordid>S0098135422000990</sourcerecordid><originalsourceid>FETCH-LOGICAL-c372t-8b715b67e291c7c24c33c2c380187f7df7b4d6f6a1da5d0066e80c00183904693</originalsourceid><addsrcrecordid>eNqNkMtu2zAQRYmgBeq6_QdmX7l8SKLUnWEkbYEA3aRrghqObBoUKZCygXxPfjS0nUWXXc3gztx5HELuOdtwxtvvxw3EaYYDThj2G8GEKLpSTXdHVrxTsqqlaj6QFWN9V3HZ1J_I55yPjDFRd92KvD67CauMyWGm4E95KWnYUzPPKRo40DEmuiTjwkW1ZjE0o0dYXAw0jtRctcomd8ZA54TWldoZ6RQt-h90O8_egbm2L5GaQF2wZUtyxtPBRSq-yWo4LSYUZ_TUurw472-GcgJgzl_Ix9H4jF_f45r8fXx43v2qnv78_L3bPlUglViqblC8GVqFouegQNQgJQiQHSsgRmVHNdS2HVvDrWksY22LHQNWqrJnddvLNelvcyHFnBOOek5uMulFc6YvtPVR_0NbX2jrG-3i3d28WA48O0w6g8MA5atUaGkb3X9MeQP5rZJP</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Time-series clustering approach for training data selection of a data-driven predictive model: Application to an industrial bio 2,3-butanediol distillation process</title><source>Elsevier</source><creator>Choi, Yeongryeol ; An, Nahyeon ; Hong, Seokyoung ; Cho, Hyungtae ; Lim, Jongkoo ; Han, In-Su ; Moon, Il ; Kim, Junghwan</creator><creatorcontrib>Choi, Yeongryeol ; An, Nahyeon ; Hong, Seokyoung ; Cho, Hyungtae ; Lim, Jongkoo ; Han, In-Su ; Moon, Il ; Kim, Junghwan</creatorcontrib><description>•The training data selection method using time-series clustering is proposed.•The proposed method is applied to commercial 2,3-BDO distillation process.•The number and ratio of training data are optimized by mathematical model. In this study, we propose a time-series clustering approach that selects optimal training data for the development of predictive models. The optimal number of clusters was set based on the variation of within-cluster sums of squares. A predictive model was developed with the selection ratio of training data from each of those clusters. Based on the results, a regression model was developed to predict the performance of the model. The search space was applied to the regression model, and the optimal training data ratio were selected satisfying the objective function and constraints. The effectiveness of the method is demonstrated by addressing a commercial bio 2,3-butanediol distillation process. As a result, the number of data for model training was reduced by 49.20% compared to the base case without clustering. The coefficient of determination (R2) showed the same level of performance, and the root-mean-square error was improved up to 14.07%.</description><identifier>ISSN: 0098-1354</identifier><identifier>EISSN: 1873-4375</identifier><identifier>DOI: 10.1016/j.compchemeng.2022.107758</identifier><language>eng</language><publisher>Elsevier Ltd</publisher><subject>Bio 2,3-BDO ; Data-driven predictive model ; Time-series clustering ; Training data selection</subject><ispartof>Computers & chemical engineering, 2022-05, Vol.161, p.107758, Article 107758</ispartof><rights>2022</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c372t-8b715b67e291c7c24c33c2c380187f7df7b4d6f6a1da5d0066e80c00183904693</citedby><cites>FETCH-LOGICAL-c372t-8b715b67e291c7c24c33c2c380187f7df7b4d6f6a1da5d0066e80c00183904693</cites><orcidid>0000-0003-1895-696X ; 0000-0002-2311-4567 ; 0000-0002-8729-1837</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,778,782,27907,27908</link.rule.ids></links><search><creatorcontrib>Choi, Yeongryeol</creatorcontrib><creatorcontrib>An, Nahyeon</creatorcontrib><creatorcontrib>Hong, Seokyoung</creatorcontrib><creatorcontrib>Cho, Hyungtae</creatorcontrib><creatorcontrib>Lim, Jongkoo</creatorcontrib><creatorcontrib>Han, In-Su</creatorcontrib><creatorcontrib>Moon, Il</creatorcontrib><creatorcontrib>Kim, Junghwan</creatorcontrib><title>Time-series clustering approach for training data selection of a data-driven predictive model: Application to an industrial bio 2,3-butanediol distillation process</title><title>Computers & chemical engineering</title><description>•The training data selection method using time-series clustering is proposed.•The proposed method is applied to commercial 2,3-BDO distillation process.•The number and ratio of training data are optimized by mathematical model. In this study, we propose a time-series clustering approach that selects optimal training data for the development of predictive models. The optimal number of clusters was set based on the variation of within-cluster sums of squares. A predictive model was developed with the selection ratio of training data from each of those clusters. Based on the results, a regression model was developed to predict the performance of the model. The search space was applied to the regression model, and the optimal training data ratio were selected satisfying the objective function and constraints. The effectiveness of the method is demonstrated by addressing a commercial bio 2,3-butanediol distillation process. As a result, the number of data for model training was reduced by 49.20% compared to the base case without clustering. The coefficient of determination (R2) showed the same level of performance, and the root-mean-square error was improved up to 14.07%.</description><subject>Bio 2,3-BDO</subject><subject>Data-driven predictive model</subject><subject>Time-series clustering</subject><subject>Training data selection</subject><issn>0098-1354</issn><issn>1873-4375</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><recordid>eNqNkMtu2zAQRYmgBeq6_QdmX7l8SKLUnWEkbYEA3aRrghqObBoUKZCygXxPfjS0nUWXXc3gztx5HELuOdtwxtvvxw3EaYYDThj2G8GEKLpSTXdHVrxTsqqlaj6QFWN9V3HZ1J_I55yPjDFRd92KvD67CauMyWGm4E95KWnYUzPPKRo40DEmuiTjwkW1ZjE0o0dYXAw0jtRctcomd8ZA54TWldoZ6RQt-h90O8_egbm2L5GaQF2wZUtyxtPBRSq-yWo4LSYUZ_TUurw472-GcgJgzl_Ix9H4jF_f45r8fXx43v2qnv78_L3bPlUglViqblC8GVqFouegQNQgJQiQHSsgRmVHNdS2HVvDrWksY22LHQNWqrJnddvLNelvcyHFnBOOek5uMulFc6YvtPVR_0NbX2jrG-3i3d28WA48O0w6g8MA5atUaGkb3X9MeQP5rZJP</recordid><startdate>202205</startdate><enddate>202205</enddate><creator>Choi, Yeongryeol</creator><creator>An, Nahyeon</creator><creator>Hong, Seokyoung</creator><creator>Cho, Hyungtae</creator><creator>Lim, Jongkoo</creator><creator>Han, In-Su</creator><creator>Moon, Il</creator><creator>Kim, Junghwan</creator><general>Elsevier Ltd</general><scope>6I.</scope><scope>AAFTH</scope><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0003-1895-696X</orcidid><orcidid>https://orcid.org/0000-0002-2311-4567</orcidid><orcidid>https://orcid.org/0000-0002-8729-1837</orcidid></search><sort><creationdate>202205</creationdate><title>Time-series clustering approach for training data selection of a data-driven predictive model: Application to an industrial bio 2,3-butanediol distillation process</title><author>Choi, Yeongryeol ; An, Nahyeon ; Hong, Seokyoung ; Cho, Hyungtae ; Lim, Jongkoo ; Han, In-Su ; Moon, Il ; Kim, Junghwan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c372t-8b715b67e291c7c24c33c2c380187f7df7b4d6f6a1da5d0066e80c00183904693</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Bio 2,3-BDO</topic><topic>Data-driven predictive model</topic><topic>Time-series clustering</topic><topic>Training data selection</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Choi, Yeongryeol</creatorcontrib><creatorcontrib>An, Nahyeon</creatorcontrib><creatorcontrib>Hong, Seokyoung</creatorcontrib><creatorcontrib>Cho, Hyungtae</creatorcontrib><creatorcontrib>Lim, Jongkoo</creatorcontrib><creatorcontrib>Han, In-Su</creatorcontrib><creatorcontrib>Moon, Il</creatorcontrib><creatorcontrib>Kim, Junghwan</creatorcontrib><collection>ScienceDirect Open Access Titles</collection><collection>Elsevier:ScienceDirect:Open Access</collection><collection>CrossRef</collection><jtitle>Computers & chemical engineering</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Choi, Yeongryeol</au><au>An, Nahyeon</au><au>Hong, Seokyoung</au><au>Cho, Hyungtae</au><au>Lim, Jongkoo</au><au>Han, In-Su</au><au>Moon, Il</au><au>Kim, Junghwan</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Time-series clustering approach for training data selection of a data-driven predictive model: Application to an industrial bio 2,3-butanediol distillation process</atitle><jtitle>Computers & chemical engineering</jtitle><date>2022-05</date><risdate>2022</risdate><volume>161</volume><spage>107758</spage><pages>107758-</pages><artnum>107758</artnum><issn>0098-1354</issn><eissn>1873-4375</eissn><abstract>•The training data selection method using time-series clustering is proposed.•The proposed method is applied to commercial 2,3-BDO distillation process.•The number and ratio of training data are optimized by mathematical model. In this study, we propose a time-series clustering approach that selects optimal training data for the development of predictive models. The optimal number of clusters was set based on the variation of within-cluster sums of squares. A predictive model was developed with the selection ratio of training data from each of those clusters. Based on the results, a regression model was developed to predict the performance of the model. The search space was applied to the regression model, and the optimal training data ratio were selected satisfying the objective function and constraints. The effectiveness of the method is demonstrated by addressing a commercial bio 2,3-butanediol distillation process. As a result, the number of data for model training was reduced by 49.20% compared to the base case without clustering. The coefficient of determination (R2) showed the same level of performance, and the root-mean-square error was improved up to 14.07%.</abstract><pub>Elsevier Ltd</pub><doi>10.1016/j.compchemeng.2022.107758</doi><orcidid>https://orcid.org/0000-0003-1895-696X</orcidid><orcidid>https://orcid.org/0000-0002-2311-4567</orcidid><orcidid>https://orcid.org/0000-0002-8729-1837</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 0098-1354
ispartof	Computers & chemical engineering, 2022-05, Vol.161, p.107758, Article 107758
issn	0098-1354 1873-4375
language	eng
recordid	cdi_crossref_primary_10_1016_j_compchemeng_2022_107758
source	Elsevier
subjects	Bio 2,3-BDO Data-driven predictive model Time-series clustering Training data selection
title	Time-series clustering approach for training data selection of a data-driven predictive model: Application to an industrial bio 2,3-butanediol distillation process
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-17T04%3A13%3A45IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-elsevier_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Time-series%20clustering%20approach%20for%20training%20data%20selection%20of%20a%20data-driven%20predictive%20model:%20Application%20to%20an%20industrial%20bio%202,3-butanediol%20distillation%20process&rft.jtitle=Computers%20&%20chemical%20engineering&rft.au=Choi,%20Yeongryeol&rft.date=2022-05&rft.volume=161&rft.spage=107758&rft.pages=107758-&rft.artnum=107758&rft.issn=0098-1354&rft.eissn=1873-4375&rft_id=info:doi/10.1016/j.compchemeng.2022.107758&rft_dat=%3Celsevier_cross%3ES0098135422000990%3C/elsevier_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c372t-8b715b67e291c7c24c33c2c380187f7df7b4d6f6a1da5d0066e80c00183904693%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true