Loading…

Predictive modelling of colossal ATR-FTIR spectral data using PLS-DA: empirical differences between PLS1-DA and PLS2-DA algorithms

In response to our review paper [L. C. Lee et al. , Analyst , 2018, 143 , 3526-3539], we present a study that compares empirical differences between PLS1-DA and PLS2-DA algorithms in modelling a colossal ATR-FTIR spectral dataset. Over the past two decades, partial least squares-discriminant analysi...

Full description

Saved in:
Bibliographic Details
Published in:Analyst (London) 2019-04, Vol.144 (8), p.267-2678
Main Authors: Lee, Loong Chuen, Jemain, Abdul Aziz
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c378t-92bd39d3f90b822600bcdd4a6f1de763877e516d969f0c0e403efe50399863583
cites cdi_FETCH-LOGICAL-c378t-92bd39d3f90b822600bcdd4a6f1de763877e516d969f0c0e403efe50399863583
container_end_page 2678
container_issue 8
container_start_page 267
container_title Analyst (London)
container_volume 144
creator Lee, Loong Chuen
Jemain, Abdul Aziz
description In response to our review paper [L. C. Lee et al. , Analyst , 2018, 143 , 3526-3539], we present a study that compares empirical differences between PLS1-DA and PLS2-DA algorithms in modelling a colossal ATR-FTIR spectral dataset. Over the past two decades, partial least squares-discriminant analysis (PLS-DA) has gained wide acceptance and huge popularity in the field of applied research, partly due to its dimensionality reduction capability and ability to handle multicollinear and correlated variables. To solve a K -class problem ( K > 2) using PLS-DA and high-dimensional data like infrared spectra, one can construct either K one- versus -all PLS1-DA models or only one PLS2-DA model. The aim of this work is to explore empirical differences between the two PLS-DA algorithms in modeling a colossal ATR-FTIR spectral dataset. The practical task is to build a prediction model using the imbalanced, high dimensional, colossal and multi-class ATR-FTIR spectra of blue gel pen inks. Four different sub-datasets were prepared from the principal dataset by considering the raw and asymmetric least squares (AsLS) preprocessed forms: (a) Raw-global region; (b) Raw-local region; (c) AsLS-global region; and (d) AsLS-local region. A series of 50 models which includes the first 50 PLS components incrementally was constructed repeatedly using the four sub-datasets. Each model was evaluated using six different variants of v -fold cross validation, autoprediction and external testing methods. As a result, each PLS-DA algorithm was represented by a number of figures of merit. The differences between PLS1-DA and PLS2-DA algorithms were assessed using hypothesis tests with respect to model accuracy, stability and fitting. On the other hand, confusion matrices of the two PLS-DA algorithms were inspected carefully for assessment of model parsimony. Overall, both the algorithms presented satisfactory model accuracy and stability. Nonetheless, PLS1-DA models showed significantly higher accuracy rates than PLS2-DA models, whereas PLS2-DA models seem to be much more stable compared to PLS1-DA models. Eventually, PLS2-DA also proved to be less prone to overfitting and is more parsimonious than PLS1-DA. In conclusion, the relatively high accuracy of the PLS1-DA algorithm is achieved at the cost of rather low parsimony and stability, and with an increased risk of overfitting. In response to our review paper [L. C. Lee et al. , Analyst , 2018, 143 , 3526-3539], we present a study that compa
doi_str_mv 10.1039/c8an02074d
format article
fullrecord <record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmed_primary_30849143</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2189550071</sourcerecordid><originalsourceid>FETCH-LOGICAL-c378t-92bd39d3f90b822600bcdd4a6f1de763877e516d969f0c0e403efe50399863583</originalsourceid><addsrcrecordid>eNpd0c9rFDEUB_Agil2rF-9KwIsIoy_JZCbxtmxbLSxa6noeMslLTZkf22RG8epf3ky3VvCUH-_DI3lfQl4yeM9A6A9WmQE41KV7RFZMVGUhJVePyQoARMErWR6RZyld5yMDCU_JkQBValaKFflzEdEFO4WfSPvRYdeF4YqOntqxG1MyHV3vLouz3fklTXu0U8w3zkyGzmmBF9tvxcn6I8V-H2KwSzF4jxEHi4m2OP1CHBbFMqNmcMue3-27qzGG6UefnpMn3nQJX9yvx-T72elu87nYfv10vllvCytqNRWat05oJ7yGVnFeAbTWudJUnjmsK6HqGiWrnK60BwtYgkCPMg9Iq0pIJY7J20PffRxvZkxT04dk84_NgOOcGs6UlhKgZpm--Y9ej3Mc8usazqFUDHhdZ_XuoGzMo4rom30MvYm_GwbNkkyzUesvd8mcZPz6vuXc9uge6N8oMnh1ADHZh-q_aMUtmPeP7A</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2204810277</pqid></control><display><type>article</type><title>Predictive modelling of colossal ATR-FTIR spectral data using PLS-DA: empirical differences between PLS1-DA and PLS2-DA algorithms</title><source>Royal Society of Chemistry:Jisc Collections:Royal Society of Chemistry Read and Publish 2022-2024 (reading list)</source><creator>Lee, Loong Chuen ; Jemain, Abdul Aziz</creator><creatorcontrib>Lee, Loong Chuen ; Jemain, Abdul Aziz</creatorcontrib><description>In response to our review paper [L. C. Lee et al. , Analyst , 2018, 143 , 3526-3539], we present a study that compares empirical differences between PLS1-DA and PLS2-DA algorithms in modelling a colossal ATR-FTIR spectral dataset. Over the past two decades, partial least squares-discriminant analysis (PLS-DA) has gained wide acceptance and huge popularity in the field of applied research, partly due to its dimensionality reduction capability and ability to handle multicollinear and correlated variables. To solve a K -class problem ( K &gt; 2) using PLS-DA and high-dimensional data like infrared spectra, one can construct either K one- versus -all PLS1-DA models or only one PLS2-DA model. The aim of this work is to explore empirical differences between the two PLS-DA algorithms in modeling a colossal ATR-FTIR spectral dataset. The practical task is to build a prediction model using the imbalanced, high dimensional, colossal and multi-class ATR-FTIR spectra of blue gel pen inks. Four different sub-datasets were prepared from the principal dataset by considering the raw and asymmetric least squares (AsLS) preprocessed forms: (a) Raw-global region; (b) Raw-local region; (c) AsLS-global region; and (d) AsLS-local region. A series of 50 models which includes the first 50 PLS components incrementally was constructed repeatedly using the four sub-datasets. Each model was evaluated using six different variants of v -fold cross validation, autoprediction and external testing methods. As a result, each PLS-DA algorithm was represented by a number of figures of merit. The differences between PLS1-DA and PLS2-DA algorithms were assessed using hypothesis tests with respect to model accuracy, stability and fitting. On the other hand, confusion matrices of the two PLS-DA algorithms were inspected carefully for assessment of model parsimony. Overall, both the algorithms presented satisfactory model accuracy and stability. Nonetheless, PLS1-DA models showed significantly higher accuracy rates than PLS2-DA models, whereas PLS2-DA models seem to be much more stable compared to PLS1-DA models. Eventually, PLS2-DA also proved to be less prone to overfitting and is more parsimonious than PLS1-DA. In conclusion, the relatively high accuracy of the PLS1-DA algorithm is achieved at the cost of rather low parsimony and stability, and with an increased risk of overfitting. In response to our review paper [L. C. Lee et al. , Analyst , 2018, 143 , 3526-3539], we present a study that compares empirical differences between PLS1-DA and PLS2-DA algorithms in modelling a colossal ATR-FTIR spectral dataset.</description><identifier>ISSN: 0003-2654</identifier><identifier>EISSN: 1364-5528</identifier><identifier>DOI: 10.1039/c8an02074d</identifier><identifier>PMID: 30849143</identifier><language>eng</language><publisher>England: Royal Society of Chemistry</publisher><subject>Accuracy ; Algorithms ; Datasets ; Discriminant analysis ; Empirical analysis ; Infrared spectra ; Inks ; Least squares ; Model accuracy ; Modelling ; Prediction models ; Stability analysis ; Test procedures</subject><ispartof>Analyst (London), 2019-04, Vol.144 (8), p.267-2678</ispartof><rights>Copyright Royal Society of Chemistry 2019</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c378t-92bd39d3f90b822600bcdd4a6f1de763877e516d969f0c0e403efe50399863583</citedby><cites>FETCH-LOGICAL-c378t-92bd39d3f90b822600bcdd4a6f1de763877e516d969f0c0e403efe50399863583</cites><orcidid>0000-0002-3144-9877 ; 0000-0001-8062-9658</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/30849143$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Lee, Loong Chuen</creatorcontrib><creatorcontrib>Jemain, Abdul Aziz</creatorcontrib><title>Predictive modelling of colossal ATR-FTIR spectral data using PLS-DA: empirical differences between PLS1-DA and PLS2-DA algorithms</title><title>Analyst (London)</title><addtitle>Analyst</addtitle><description>In response to our review paper [L. C. Lee et al. , Analyst , 2018, 143 , 3526-3539], we present a study that compares empirical differences between PLS1-DA and PLS2-DA algorithms in modelling a colossal ATR-FTIR spectral dataset. Over the past two decades, partial least squares-discriminant analysis (PLS-DA) has gained wide acceptance and huge popularity in the field of applied research, partly due to its dimensionality reduction capability and ability to handle multicollinear and correlated variables. To solve a K -class problem ( K &gt; 2) using PLS-DA and high-dimensional data like infrared spectra, one can construct either K one- versus -all PLS1-DA models or only one PLS2-DA model. The aim of this work is to explore empirical differences between the two PLS-DA algorithms in modeling a colossal ATR-FTIR spectral dataset. The practical task is to build a prediction model using the imbalanced, high dimensional, colossal and multi-class ATR-FTIR spectra of blue gel pen inks. Four different sub-datasets were prepared from the principal dataset by considering the raw and asymmetric least squares (AsLS) preprocessed forms: (a) Raw-global region; (b) Raw-local region; (c) AsLS-global region; and (d) AsLS-local region. A series of 50 models which includes the first 50 PLS components incrementally was constructed repeatedly using the four sub-datasets. Each model was evaluated using six different variants of v -fold cross validation, autoprediction and external testing methods. As a result, each PLS-DA algorithm was represented by a number of figures of merit. The differences between PLS1-DA and PLS2-DA algorithms were assessed using hypothesis tests with respect to model accuracy, stability and fitting. On the other hand, confusion matrices of the two PLS-DA algorithms were inspected carefully for assessment of model parsimony. Overall, both the algorithms presented satisfactory model accuracy and stability. Nonetheless, PLS1-DA models showed significantly higher accuracy rates than PLS2-DA models, whereas PLS2-DA models seem to be much more stable compared to PLS1-DA models. Eventually, PLS2-DA also proved to be less prone to overfitting and is more parsimonious than PLS1-DA. In conclusion, the relatively high accuracy of the PLS1-DA algorithm is achieved at the cost of rather low parsimony and stability, and with an increased risk of overfitting. In response to our review paper [L. C. Lee et al. , Analyst , 2018, 143 , 3526-3539], we present a study that compares empirical differences between PLS1-DA and PLS2-DA algorithms in modelling a colossal ATR-FTIR spectral dataset.</description><subject>Accuracy</subject><subject>Algorithms</subject><subject>Datasets</subject><subject>Discriminant analysis</subject><subject>Empirical analysis</subject><subject>Infrared spectra</subject><subject>Inks</subject><subject>Least squares</subject><subject>Model accuracy</subject><subject>Modelling</subject><subject>Prediction models</subject><subject>Stability analysis</subject><subject>Test procedures</subject><issn>0003-2654</issn><issn>1364-5528</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><recordid>eNpd0c9rFDEUB_Agil2rF-9KwIsIoy_JZCbxtmxbLSxa6noeMslLTZkf22RG8epf3ky3VvCUH-_DI3lfQl4yeM9A6A9WmQE41KV7RFZMVGUhJVePyQoARMErWR6RZyld5yMDCU_JkQBValaKFflzEdEFO4WfSPvRYdeF4YqOntqxG1MyHV3vLouz3fklTXu0U8w3zkyGzmmBF9tvxcn6I8V-H2KwSzF4jxEHi4m2OP1CHBbFMqNmcMue3-27qzGG6UefnpMn3nQJX9yvx-T72elu87nYfv10vllvCytqNRWat05oJ7yGVnFeAbTWudJUnjmsK6HqGiWrnK60BwtYgkCPMg9Iq0pIJY7J20PffRxvZkxT04dk84_NgOOcGs6UlhKgZpm--Y9ej3Mc8usazqFUDHhdZ_XuoGzMo4rom30MvYm_GwbNkkyzUesvd8mcZPz6vuXc9uge6N8oMnh1ADHZh-q_aMUtmPeP7A</recordid><startdate>20190408</startdate><enddate>20190408</enddate><creator>Lee, Loong Chuen</creator><creator>Jemain, Abdul Aziz</creator><general>Royal Society of Chemistry</general><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SR</scope><scope>7U5</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>L7M</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0002-3144-9877</orcidid><orcidid>https://orcid.org/0000-0001-8062-9658</orcidid></search><sort><creationdate>20190408</creationdate><title>Predictive modelling of colossal ATR-FTIR spectral data using PLS-DA: empirical differences between PLS1-DA and PLS2-DA algorithms</title><author>Lee, Loong Chuen ; Jemain, Abdul Aziz</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c378t-92bd39d3f90b822600bcdd4a6f1de763877e516d969f0c0e403efe50399863583</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Accuracy</topic><topic>Algorithms</topic><topic>Datasets</topic><topic>Discriminant analysis</topic><topic>Empirical analysis</topic><topic>Infrared spectra</topic><topic>Inks</topic><topic>Least squares</topic><topic>Model accuracy</topic><topic>Modelling</topic><topic>Prediction models</topic><topic>Stability analysis</topic><topic>Test procedures</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Lee, Loong Chuen</creatorcontrib><creatorcontrib>Jemain, Abdul Aziz</creatorcontrib><collection>PubMed</collection><collection>CrossRef</collection><collection>Engineered Materials Abstracts</collection><collection>Solid State and Superconductivity Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>MEDLINE - Academic</collection><jtitle>Analyst (London)</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Lee, Loong Chuen</au><au>Jemain, Abdul Aziz</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Predictive modelling of colossal ATR-FTIR spectral data using PLS-DA: empirical differences between PLS1-DA and PLS2-DA algorithms</atitle><jtitle>Analyst (London)</jtitle><addtitle>Analyst</addtitle><date>2019-04-08</date><risdate>2019</risdate><volume>144</volume><issue>8</issue><spage>267</spage><epage>2678</epage><pages>267-2678</pages><issn>0003-2654</issn><eissn>1364-5528</eissn><abstract>In response to our review paper [L. C. Lee et al. , Analyst , 2018, 143 , 3526-3539], we present a study that compares empirical differences between PLS1-DA and PLS2-DA algorithms in modelling a colossal ATR-FTIR spectral dataset. Over the past two decades, partial least squares-discriminant analysis (PLS-DA) has gained wide acceptance and huge popularity in the field of applied research, partly due to its dimensionality reduction capability and ability to handle multicollinear and correlated variables. To solve a K -class problem ( K &gt; 2) using PLS-DA and high-dimensional data like infrared spectra, one can construct either K one- versus -all PLS1-DA models or only one PLS2-DA model. The aim of this work is to explore empirical differences between the two PLS-DA algorithms in modeling a colossal ATR-FTIR spectral dataset. The practical task is to build a prediction model using the imbalanced, high dimensional, colossal and multi-class ATR-FTIR spectra of blue gel pen inks. Four different sub-datasets were prepared from the principal dataset by considering the raw and asymmetric least squares (AsLS) preprocessed forms: (a) Raw-global region; (b) Raw-local region; (c) AsLS-global region; and (d) AsLS-local region. A series of 50 models which includes the first 50 PLS components incrementally was constructed repeatedly using the four sub-datasets. Each model was evaluated using six different variants of v -fold cross validation, autoprediction and external testing methods. As a result, each PLS-DA algorithm was represented by a number of figures of merit. The differences between PLS1-DA and PLS2-DA algorithms were assessed using hypothesis tests with respect to model accuracy, stability and fitting. On the other hand, confusion matrices of the two PLS-DA algorithms were inspected carefully for assessment of model parsimony. Overall, both the algorithms presented satisfactory model accuracy and stability. Nonetheless, PLS1-DA models showed significantly higher accuracy rates than PLS2-DA models, whereas PLS2-DA models seem to be much more stable compared to PLS1-DA models. Eventually, PLS2-DA also proved to be less prone to overfitting and is more parsimonious than PLS1-DA. In conclusion, the relatively high accuracy of the PLS1-DA algorithm is achieved at the cost of rather low parsimony and stability, and with an increased risk of overfitting. In response to our review paper [L. C. Lee et al. , Analyst , 2018, 143 , 3526-3539], we present a study that compares empirical differences between PLS1-DA and PLS2-DA algorithms in modelling a colossal ATR-FTIR spectral dataset.</abstract><cop>England</cop><pub>Royal Society of Chemistry</pub><pmid>30849143</pmid><doi>10.1039/c8an02074d</doi><tpages>9</tpages><orcidid>https://orcid.org/0000-0002-3144-9877</orcidid><orcidid>https://orcid.org/0000-0001-8062-9658</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 0003-2654
ispartof Analyst (London), 2019-04, Vol.144 (8), p.267-2678
issn 0003-2654
1364-5528
language eng
recordid cdi_pubmed_primary_30849143
source Royal Society of Chemistry:Jisc Collections:Royal Society of Chemistry Read and Publish 2022-2024 (reading list)
subjects Accuracy
Algorithms
Datasets
Discriminant analysis
Empirical analysis
Infrared spectra
Inks
Least squares
Model accuracy
Modelling
Prediction models
Stability analysis
Test procedures
title Predictive modelling of colossal ATR-FTIR spectral data using PLS-DA: empirical differences between PLS1-DA and PLS2-DA algorithms
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-29T02%3A57%3A48IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Predictive%20modelling%20of%20colossal%20ATR-FTIR%20spectral%20data%20using%20PLS-DA:%20empirical%20differences%20between%20PLS1-DA%20and%20PLS2-DA%20algorithms&rft.jtitle=Analyst%20(London)&rft.au=Lee,%20Loong%20Chuen&rft.date=2019-04-08&rft.volume=144&rft.issue=8&rft.spage=267&rft.epage=2678&rft.pages=267-2678&rft.issn=0003-2654&rft.eissn=1364-5528&rft_id=info:doi/10.1039/c8an02074d&rft_dat=%3Cproquest_pubme%3E2189550071%3C/proquest_pubme%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c378t-92bd39d3f90b822600bcdd4a6f1de763877e516d969f0c0e403efe50399863583%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2204810277&rft_id=info:pmid/30849143&rfr_iscdi=true