Loading…

Predictive modelling of colossal ATR-FTIR spectral data using PLS-DA: empirical differences between PLS1-DA and PLS2-DA algorithms

In response to our review paper [L. C. Lee et al. , Analyst , 2018, 143 , 3526-3539], we present a study that compares empirical differences between PLS1-DA and PLS2-DA algorithms in modelling a colossal ATR-FTIR spectral dataset. Over the past two decades, partial least squares-discriminant analysi...

Full description

Saved in:

Bibliographic Details
Published in:	Analyst (London) 2019-04, Vol.144 (8), p.267-2678
Main Authors:	Lee, Loong Chuen, Jemain, Abdul Aziz
Format:	Article
Language:	English
Subjects:	Accuracy Algorithms Datasets Discriminant analysis Empirical analysis Infrared spectra Inks Least squares Model accuracy Modelling Prediction models Stability analysis Test procedures
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by	cdi_FETCH-LOGICAL-c378t-92bd39d3f90b822600bcdd4a6f1de763877e516d969f0c0e403efe50399863583
cites	cdi_FETCH-LOGICAL-c378t-92bd39d3f90b822600bcdd4a6f1de763877e516d969f0c0e403efe50399863583
container_end_page	2678
container_issue	8
container_start_page	267
container_title	Analyst (London)
container_volume	144
creator	Lee, Loong Chuen Jemain, Abdul Aziz
description	In response to our review paper [L. C. Lee et al. , Analyst , 2018, 143 , 3526-3539], we present a study that compares empirical differences between PLS1-DA and PLS2-DA algorithms in modelling a colossal ATR-FTIR spectral dataset. Over the past two decades, partial least squares-discriminant analysis (PLS-DA) has gained wide acceptance and huge popularity in the field of applied research, partly due to its dimensionality reduction capability and ability to handle multicollinear and correlated variables. To solve a K -class problem ( K > 2) using PLS-DA and high-dimensional data like infrared spectra, one can construct either K one- versus -all PLS1-DA models or only one PLS2-DA model. The aim of this work is to explore empirical differences between the two PLS-DA algorithms in modeling a colossal ATR-FTIR spectral dataset. The practical task is to build a prediction model using the imbalanced, high dimensional, colossal and multi-class ATR-FTIR spectra of blue gel pen inks. Four different sub-datasets were prepared from the principal dataset by considering the raw and asymmetric least squares (AsLS) preprocessed forms: (a) Raw-global region; (b) Raw-local region; (c) AsLS-global region; and (d) AsLS-local region. A series of 50 models which includes the first 50 PLS components incrementally was constructed repeatedly using the four sub-datasets. Each model was evaluated using six different variants of v -fold cross validation, autoprediction and external testing methods. As a result, each PLS-DA algorithm was represented by a number of figures of merit. The differences between PLS1-DA and PLS2-DA algorithms were assessed using hypothesis tests with respect to model accuracy, stability and fitting. On the other hand, confusion matrices of the two PLS-DA algorithms were inspected carefully for assessment of model parsimony. Overall, both the algorithms presented satisfactory model accuracy and stability. Nonetheless, PLS1-DA models showed significantly higher accuracy rates than PLS2-DA models, whereas PLS2-DA models seem to be much more stable compared to PLS1-DA models. Eventually, PLS2-DA also proved to be less prone to overfitting and is more parsimonious than PLS1-DA. In conclusion, the relatively high accuracy of the PLS1-DA algorithm is achieved at the cost of rather low parsimony and stability, and with an increased risk of overfitting. In response to our review paper [L. C. Lee et al. , Analyst , 2018, 143 , 3526-3539], we present a study that compa
doi_str_mv	10.1039/c8an02074d
format	article
fullrecord	<record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmed_primary_30849143</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2189550071</sourcerecordid><originalsourceid>FETCH-LOGICAL-c378t-92bd39d3f90b822600bcdd4a6f1de763877e516d969f0c0e403efe50399863583</originalsourceid><addsrcrecordid>eNpd0c9rFDEUB_Agil2rF-9KwIsIoy_JZCbxtmxbLSxa6noeMslLTZkf22RG8epf3ky3VvCUH-_DI3lfQl4yeM9A6A9WmQE41KV7RFZMVGUhJVePyQoARMErWR6RZyld5yMDCU_JkQBValaKFflzEdEFO4WfSPvRYdeF4YqOntqxG1MyHV3vLouz3fklTXu0U8w3zkyGzmmBF9tvxcn6I8V-H2KwSzF4jxEHi4m2OP1CHBbFMqNmcMue3-27qzGG6UefnpMn3nQJX9yvx-T72elu87nYfv10vllvCytqNRWat05oJ7yGVnFeAbTWudJUnjmsK6HqGiWrnK60BwtYgkCPMg9Iq0pIJY7J20PffRxvZkxT04dk84_NgOOcGs6UlhKgZpm--Y9ej3Mc8usazqFUDHhdZ_XuoGzMo4rom30MvYm_GwbNkkyzUesvd8mcZPz6vuXc9uge6N8oMnh1ADHZh-q_aMUtmPeP7A</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2204810277</pqid></control><display><type>article</type><title>Predictive modelling of colossal ATR-FTIR spectral data using PLS-DA: empirical differences between PLS1-DA and PLS2-DA algorithms</title><source>Royal Society of Chemistry:Jisc Collections:Royal Society of Chemistry Read and Publish 2022-2024 (reading list)</source><creator>Lee, Loong Chuen ; Jemain, Abdul Aziz</creator><creatorcontrib>Lee, Loong Chuen ; Jemain, Abdul Aziz</creatorcontrib><description>In response to our review paper [L. C. Lee et al. , Analyst , 2018, 143 , 3526-3539], we present a study that compares empirical differences between PLS1-DA and PLS2-DA algorithms in modelling a colossal ATR-FTIR spectral dataset. Over the past two decades, partial least squares-discriminant analysis (PLS-DA) has gained wide acceptance and huge popularity in the field of applied research, partly due to its dimensionality reduction capability and ability to handle multicollinear and correlated variables. To solve a K -class problem ( K > 2) using PLS-DA and high-dimensional data like infrared spectra, one can construct either K one- versus -all PLS1-DA models or only one PLS2-DA model. The aim of this work is to explore empirical differences between the two PLS-DA algorithms in modeling a colossal ATR-FTIR spectral dataset. The practical task is to build a prediction model using the imbalanced, high dimensional, colossal and multi-class ATR-FTIR spectra of blue gel pen inks. Four different sub-datasets were prepared from the principal dataset by considering the raw and asymmetric least squares (AsLS) preprocessed forms: (a) Raw-global region; (b) Raw-local region; (c) AsLS-global region; and (d) AsLS-local region. A series of 50 models which includes the first 50 PLS components incrementally was constructed repeatedly using the four sub-datasets. Each model was evaluated using six different variants of v -fold cross validation, autoprediction and external testing methods. As a result, each PLS-DA algorithm was represented by a number of figures of merit. The differences between PLS1-DA and PLS2-DA algorithms were assessed using hypothesis tests with respect to model accuracy, stability and fitting. On the other hand, confusion matrices of the two PLS-DA algorithms were inspected carefully for assessment of model parsimony. Overall, both the algorithms presented satisfactory model accuracy and stability. Nonetheless, PLS1-DA models showed significantly higher accuracy rates than PLS2-DA models, whereas PLS2-DA models seem to be much more stable compared to PLS1-DA models. Eventually, PLS2-DA also proved to be less prone to overfitting and is more parsimonious than PLS1-DA. In conclusion, the relatively high accuracy of the PLS1-DA algorithm is achieved at the cost of rather low parsimony and stability, and with an increased risk of overfitting. In response to our review paper [L. C. Lee et al. , Analyst , 2018, 143 , 3526-3539], we present a study that compares empirical differences between PLS1-DA and PLS2-DA algorithms in modelling a colossal ATR-FTIR spectral dataset.</description><identifier>ISSN: 0003-2654</identifier><identifier>EISSN: 1364-5528</identifier><identifier>DOI: 10.1039/c8an02074d</identifier><identifier>PMID: 30849143</identifier><language>eng</language><publisher>England: Royal Society of Chemistry</publisher><subject>Accuracy ; Algorithms ; Datasets ; Discriminant analysis ; Empirical analysis ; Infrared spectra ; Inks ; Least squares ; Model accuracy ; Modelling ; Prediction models ; Stability analysis ; Test procedures</subject><ispartof>Analyst (London), 2019-04, Vol.144 (8), p.267-2678</ispartof><rights>Copyright Royal Society of Chemistry 2019</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c378t-92bd39d3f90b822600bcdd4a6f1de763877e516d969f0c0e403efe50399863583</citedby><cites>FETCH-LOGICAL-c378t-92bd39d3f90b822600bcdd4a6f1de763877e516d969f0c0e403efe50399863583</cites><orcidid>0000-0002-3144-9877 ; 0000-0001-8062-9658</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/30849143$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Lee, Loong Chuen</creatorcontrib><creatorcontrib>Jemain, Abdul Aziz</creatorcontrib><title>Predictive modelling of colossal ATR-FTIR spectral data using PLS-DA: empirical differences between PLS1-DA and PLS2-DA algorithms</title><title>Analyst (London)</title><addtitle>Analyst</addtitle><description>In response to our review paper [L. C. Lee et al. , Analyst , 2018, 143 , 3526-3539], we present a study that compares empirical differences between PLS1-DA and PLS2-DA algorithms in modelling a colossal ATR-FTIR spectral dataset. Over the past two decades, partial least squares-discriminant analysis (PLS-DA) has gained wide acceptance and huge popularity in the field of applied research, partly due to its dimensionality reduction capability and ability to handle multicollinear and correlated variables. To solve a K -class problem ( K > 2) using PLS-DA and high-dimensional data like infrared spectra, one can construct either K one- versus -all PLS1-DA models or only one PLS2-DA model. The aim of this work is to explore empirical differences between the two PLS-DA algorithms in modeling a colossal ATR-FTIR spectral dataset. The practical task is to build a prediction model using the imbalanced, high dimensional, colossal and multi-class ATR-FTIR spectra of blue gel pen inks. Four different sub-datasets were prepared from the principal dataset by considering the raw and asymmetric least squares (AsLS) preprocessed forms: (a) Raw-global region; (b) Raw-local region; (c) AsLS-global region; and (d) AsLS-local region. A series of 50 models which includes the first 50 PLS components incrementally was constructed repeatedly using the four sub-datasets. Each model was evaluated using six different variants of v -fold cross validation, autoprediction and external testing methods. As a result, each PLS-DA algorithm was represented by a number of figures of merit. The differences between PLS1-DA and PLS2-DA algorithms were assessed using hypothesis tests with respect to model accuracy, stability and fitting. On the other hand, confusion matrices of the two PLS-DA algorithms were inspected carefully for assessment of model parsimony. Overall, both the algorithms presented satisfactory model accuracy and stability. Nonetheless, PLS1-DA models showed significantly higher accuracy rates than PLS2-DA models, whereas PLS2-DA models seem to be much more stable compared to PLS1-DA models. Eventually, PLS2-DA also proved to be less prone to overfitting and is more parsimonious than PLS1-DA. In conclusion, the relatively high accuracy of the PLS1-DA algorithm is achieved at the cost of rather low parsimony and stability, and with an increased risk of overfitting. In response to our review paper [L. C. Lee et al. , Analyst , 2018, 143 , 3526-3539], we present a study that compares empirical differences between PLS1-DA and PLS2-DA algorithms in modelling a colossal ATR-FTIR spectral dataset.</description><subject>Accuracy</subject><subject>Algorithms</subject><subject>Datasets</subject><subject>Discriminant analysis</subject><subject>Empirical analysis</subject><subject>Infrared spectra</subject><subject>Inks</subject><subject>Least squares</subject><subject>Model accuracy</subject><subject>Modelling</subject><subject>Prediction models</subject><subject>Stability analysis</subject><subject>Test procedures</subject><issn>0003-2654</issn><issn>1364-5528</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><recordid>eNpd0c9rFDEUB_Agil2rF-9KwIsIoy_JZCbxtmxbLSxa6noeMslLTZkf22RG8epf3ky3VvCUH-_DI3lfQl4yeM9A6A9WmQE41KV7RFZMVGUhJVePyQoARMErWR6RZyld5yMDCU_JkQBValaKFflzEdEFO4WfSPvRYdeF4YqOntqxG1MyHV3vLouz3fklTXu0U8w3zkyGzmmBF9tvxcn6I8V-H2KwSzF4jxEHi4m2OP1CHBbFMqNmcMue3-27qzGG6UefnpMn3nQJX9yvx-T72elu87nYfv10vllvCytqNRWat05oJ7yGVnFeAbTWudJUnjmsK6HqGiWrnK60BwtYgkCPMg9Iq0pIJY7J20PffRxvZkxT04dk84_NgOOcGs6UlhKgZpm--Y9ej3Mc8usazqFUDHhdZ_XuoGzMo4rom30MvYm_GwbNkkyzUesvd8mcZPz6vuXc9uge6N8oMnh1ADHZh-q_aMUtmPeP7A</recordid><startdate>20190408</startdate><enddate>20190408</enddate><creator>Lee, Loong Chuen</creator><creator>Jemain, Abdul Aziz</creator><general>Royal Society of Chemistry</general><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SR</scope><scope>7U5</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>L7M</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0002-3144-9877</orcidid><orcidid>https://orcid.org/0000-0001-8062-9658</orcidid></search><sort><creationdate>20190408</creationdate><title>Predictive modelling of colossal ATR-FTIR spectral data using PLS-DA: empirical differences between PLS1-DA and PLS2-DA algorithms</title><author>Lee, Loong Chuen ; Jemain, Abdul Aziz</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c378t-92bd39d3f90b822600bcdd4a6f1de763877e516d969f0c0e403efe50399863583</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Accuracy</topic><topic>Algorithms</topic><topic>Datasets</topic><topic>Discriminant analysis</topic><topic>Empirical analysis</topic><topic>Infrared spectra</topic><topic>Inks</topic><topic>Least squares</topic><topic>Model accuracy</topic><topic>Modelling</topic><topic>Prediction models</topic><topic>Stability analysis</topic><topic>Test procedures</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Lee, Loong Chuen</creatorcontrib><creatorcontrib>Jemain, Abdul Aziz</creatorcontrib><collection>PubMed</collection><collection>CrossRef</collection><collection>Engineered Materials Abstracts</collection><collection>Solid State and Superconductivity Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>MEDLINE - Academic</collection><jtitle>Analyst (London)</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Lee, Loong Chuen</au><au>Jemain, Abdul Aziz</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Predictive modelling of colossal ATR-FTIR spectral data using PLS-DA: empirical differences between PLS1-DA and PLS2-DA algorithms</atitle><jtitle>Analyst (London)</jtitle><addtitle>Analyst</addtitle><date>2019-04-08</date><risdate>2019</risdate><volume>144</volume><issue>8</issue><spage>267</spage><epage>2678</epage><pages>267-2678</pages><issn>0003-2654</issn><eissn>1364-5528</eissn><abstract>In response to our review paper [L. C. Lee et al. , Analyst , 2018, 143 , 3526-3539], we present a study that compares empirical differences between PLS1-DA and PLS2-DA algorithms in modelling a colossal ATR-FTIR spectral dataset. Over the past two decades, partial least squares-discriminant analysis (PLS-DA) has gained wide acceptance and huge popularity in the field of applied research, partly due to its dimensionality reduction capability and ability to handle multicollinear and correlated variables. To solve a K -class problem ( K > 2) using PLS-DA and high-dimensional data like infrared spectra, one can construct either K one- versus -all PLS1-DA models or only one PLS2-DA model. The aim of this work is to explore empirical differences between the two PLS-DA algorithms in modeling a colossal ATR-FTIR spectral dataset. The practical task is to build a prediction model using the imbalanced, high dimensional, colossal and multi-class ATR-FTIR spectra of blue gel pen inks. Four different sub-datasets were prepared from the principal dataset by considering the raw and asymmetric least squares (AsLS) preprocessed forms: (a) Raw-global region; (b) Raw-local region; (c) AsLS-global region; and (d) AsLS-local region. A series of 50 models which includes the first 50 PLS components incrementally was constructed repeatedly using the four sub-datasets. Each model was evaluated using six different variants of v -fold cross validation, autoprediction and external testing methods. As a result, each PLS-DA algorithm was represented by a number of figures of merit. The differences between PLS1-DA and PLS2-DA algorithms were assessed using hypothesis tests with respect to model accuracy, stability and fitting. On the other hand, confusion matrices of the two PLS-DA algorithms were inspected carefully for assessment of model parsimony. Overall, both the algorithms presented satisfactory model accuracy and stability. Nonetheless, PLS1-DA models showed significantly higher accuracy rates than PLS2-DA models, whereas PLS2-DA models seem to be much more stable compared to PLS1-DA models. Eventually, PLS2-DA also proved to be less prone to overfitting and is more parsimonious than PLS1-DA. In conclusion, the relatively high accuracy of the PLS1-DA algorithm is achieved at the cost of rather low parsimony and stability, and with an increased risk of overfitting. In response to our review paper [L. C. Lee et al. , Analyst , 2018, 143 , 3526-3539], we present a study that compares empirical differences between PLS1-DA and PLS2-DA algorithms in modelling a colossal ATR-FTIR spectral dataset.</abstract><cop>England</cop><pub>Royal Society of Chemistry</pub><pmid>30849143</pmid><doi>10.1039/c8an02074d</doi><tpages>9</tpages><orcidid>https://orcid.org/0000-0002-3144-9877</orcidid><orcidid>https://orcid.org/0000-0001-8062-9658</orcidid></addata></record>
fulltext	fulltext
identifier	ISSN: 0003-2654
ispartof	Analyst (London), 2019-04, Vol.144 (8), p.267-2678
issn	0003-2654 1364-5528
language	eng
recordid	cdi_pubmed_primary_30849143
source	Royal Society of Chemistry:Jisc Collections:Royal Society of Chemistry Read and Publish 2022-2024 (reading list)
subjects	Accuracy Algorithms Datasets Discriminant analysis Empirical analysis Infrared spectra Inks Least squares Model accuracy Modelling Prediction models Stability analysis Test procedures
title	Predictive modelling of colossal ATR-FTIR spectral data using PLS-DA: empirical differences between PLS1-DA and PLS2-DA algorithms
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-29T02%3A57%3A48IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Predictive%20modelling%20of%20colossal%20ATR-FTIR%20spectral%20data%20using%20PLS-DA:%20empirical%20differences%20between%20PLS1-DA%20and%20PLS2-DA%20algorithms&rft.jtitle=Analyst%20(London)&rft.au=Lee,%20Loong%20Chuen&rft.date=2019-04-08&rft.volume=144&rft.issue=8&rft.spage=267&rft.epage=2678&rft.pages=267-2678&rft.issn=0003-2654&rft.eissn=1364-5528&rft_id=info:doi/10.1039/c8an02074d&rft_dat=%3Cproquest_pubme%3E2189550071%3C/proquest_pubme%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c378t-92bd39d3f90b822600bcdd4a6f1de763877e516d969f0c0e403efe50399863583%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2204810277&rft_id=info:pmid/30849143&rfr_iscdi=true