Loading…
Predictive modelling of colossal ATR-FTIR spectral data using PLS-DA: empirical differences between PLS1-DA and PLS2-DA algorithms
In response to our review paper [L. C. Lee et al. , Analyst , 2018, 143 , 3526-3539], we present a study that compares empirical differences between PLS1-DA and PLS2-DA algorithms in modelling a colossal ATR-FTIR spectral dataset. Over the past two decades, partial least squares-discriminant analysi...
Saved in:
Published in: | Analyst (London) 2019-04, Vol.144 (8), p.267-2678 |
---|---|
Main Authors: | , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | cdi_FETCH-LOGICAL-c378t-92bd39d3f90b822600bcdd4a6f1de763877e516d969f0c0e403efe50399863583 |
---|---|
cites | cdi_FETCH-LOGICAL-c378t-92bd39d3f90b822600bcdd4a6f1de763877e516d969f0c0e403efe50399863583 |
container_end_page | 2678 |
container_issue | 8 |
container_start_page | 267 |
container_title | Analyst (London) |
container_volume | 144 |
creator | Lee, Loong Chuen Jemain, Abdul Aziz |
description | In response to our review paper [L. C. Lee
et al.
,
Analyst
, 2018,
143
, 3526-3539], we present a study that compares empirical differences between PLS1-DA and PLS2-DA algorithms in modelling a colossal ATR-FTIR spectral dataset. Over the past two decades, partial least squares-discriminant analysis (PLS-DA) has gained wide acceptance and huge popularity in the field of applied research, partly due to its dimensionality reduction capability and ability to handle multicollinear and correlated variables. To solve a
K
-class problem (
K
> 2) using PLS-DA and high-dimensional data like infrared spectra, one can construct either
K
one-
versus
-all PLS1-DA models or only one PLS2-DA model. The aim of this work is to explore empirical differences between the two PLS-DA algorithms in modeling a colossal ATR-FTIR spectral dataset. The practical task is to build a prediction model using the imbalanced, high dimensional, colossal and multi-class ATR-FTIR spectra of blue gel pen inks. Four different sub-datasets were prepared from the principal dataset by considering the raw and asymmetric least squares (AsLS) preprocessed forms: (a) Raw-global region; (b) Raw-local region; (c) AsLS-global region; and (d) AsLS-local region. A series of 50 models which includes the first 50 PLS components incrementally was constructed repeatedly using the four sub-datasets. Each model was evaluated using six different variants of
v
-fold cross validation, autoprediction and external testing methods. As a result, each PLS-DA algorithm was represented by a number of figures of merit. The differences between PLS1-DA and PLS2-DA algorithms were assessed using hypothesis tests with respect to model accuracy, stability and fitting. On the other hand, confusion matrices of the two PLS-DA algorithms were inspected carefully for assessment of model parsimony. Overall, both the algorithms presented satisfactory model accuracy and stability. Nonetheless, PLS1-DA models showed significantly higher accuracy rates than PLS2-DA models, whereas PLS2-DA models seem to be much more stable compared to PLS1-DA models. Eventually, PLS2-DA also proved to be less prone to overfitting and is more parsimonious than PLS1-DA. In conclusion, the relatively high accuracy of the PLS1-DA algorithm is achieved at the cost of rather low parsimony and stability, and with an increased risk of overfitting.
In response to our review paper [L. C. Lee
et al.
,
Analyst
, 2018,
143
, 3526-3539], we present a study that compa |
doi_str_mv | 10.1039/c8an02074d |
format | article |
fullrecord | <record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmed_primary_30849143</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2189550071</sourcerecordid><originalsourceid>FETCH-LOGICAL-c378t-92bd39d3f90b822600bcdd4a6f1de763877e516d969f0c0e403efe50399863583</originalsourceid><addsrcrecordid>eNpd0c9rFDEUB_Agil2rF-9KwIsIoy_JZCbxtmxbLSxa6noeMslLTZkf22RG8epf3ky3VvCUH-_DI3lfQl4yeM9A6A9WmQE41KV7RFZMVGUhJVePyQoARMErWR6RZyld5yMDCU_JkQBValaKFflzEdEFO4WfSPvRYdeF4YqOntqxG1MyHV3vLouz3fklTXu0U8w3zkyGzmmBF9tvxcn6I8V-H2KwSzF4jxEHi4m2OP1CHBbFMqNmcMue3-27qzGG6UefnpMn3nQJX9yvx-T72elu87nYfv10vllvCytqNRWat05oJ7yGVnFeAbTWudJUnjmsK6HqGiWrnK60BwtYgkCPMg9Iq0pIJY7J20PffRxvZkxT04dk84_NgOOcGs6UlhKgZpm--Y9ej3Mc8usazqFUDHhdZ_XuoGzMo4rom30MvYm_GwbNkkyzUesvd8mcZPz6vuXc9uge6N8oMnh1ADHZh-q_aMUtmPeP7A</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2204810277</pqid></control><display><type>article</type><title>Predictive modelling of colossal ATR-FTIR spectral data using PLS-DA: empirical differences between PLS1-DA and PLS2-DA algorithms</title><source>Royal Society of Chemistry:Jisc Collections:Royal Society of Chemistry Read and Publish 2022-2024 (reading list)</source><creator>Lee, Loong Chuen ; Jemain, Abdul Aziz</creator><creatorcontrib>Lee, Loong Chuen ; Jemain, Abdul Aziz</creatorcontrib><description>In response to our review paper [L. C. Lee
et al.
,
Analyst
, 2018,
143
, 3526-3539], we present a study that compares empirical differences between PLS1-DA and PLS2-DA algorithms in modelling a colossal ATR-FTIR spectral dataset. Over the past two decades, partial least squares-discriminant analysis (PLS-DA) has gained wide acceptance and huge popularity in the field of applied research, partly due to its dimensionality reduction capability and ability to handle multicollinear and correlated variables. To solve a
K
-class problem (
K
> 2) using PLS-DA and high-dimensional data like infrared spectra, one can construct either
K
one-
versus
-all PLS1-DA models or only one PLS2-DA model. The aim of this work is to explore empirical differences between the two PLS-DA algorithms in modeling a colossal ATR-FTIR spectral dataset. The practical task is to build a prediction model using the imbalanced, high dimensional, colossal and multi-class ATR-FTIR spectra of blue gel pen inks. Four different sub-datasets were prepared from the principal dataset by considering the raw and asymmetric least squares (AsLS) preprocessed forms: (a) Raw-global region; (b) Raw-local region; (c) AsLS-global region; and (d) AsLS-local region. A series of 50 models which includes the first 50 PLS components incrementally was constructed repeatedly using the four sub-datasets. Each model was evaluated using six different variants of
v
-fold cross validation, autoprediction and external testing methods. As a result, each PLS-DA algorithm was represented by a number of figures of merit. The differences between PLS1-DA and PLS2-DA algorithms were assessed using hypothesis tests with respect to model accuracy, stability and fitting. On the other hand, confusion matrices of the two PLS-DA algorithms were inspected carefully for assessment of model parsimony. Overall, both the algorithms presented satisfactory model accuracy and stability. Nonetheless, PLS1-DA models showed significantly higher accuracy rates than PLS2-DA models, whereas PLS2-DA models seem to be much more stable compared to PLS1-DA models. Eventually, PLS2-DA also proved to be less prone to overfitting and is more parsimonious than PLS1-DA. In conclusion, the relatively high accuracy of the PLS1-DA algorithm is achieved at the cost of rather low parsimony and stability, and with an increased risk of overfitting.
In response to our review paper [L. C. Lee
et al.
,
Analyst
, 2018,
143
, 3526-3539], we present a study that compares empirical differences between PLS1-DA and PLS2-DA algorithms in modelling a colossal ATR-FTIR spectral dataset.</description><identifier>ISSN: 0003-2654</identifier><identifier>EISSN: 1364-5528</identifier><identifier>DOI: 10.1039/c8an02074d</identifier><identifier>PMID: 30849143</identifier><language>eng</language><publisher>England: Royal Society of Chemistry</publisher><subject>Accuracy ; Algorithms ; Datasets ; Discriminant analysis ; Empirical analysis ; Infrared spectra ; Inks ; Least squares ; Model accuracy ; Modelling ; Prediction models ; Stability analysis ; Test procedures</subject><ispartof>Analyst (London), 2019-04, Vol.144 (8), p.267-2678</ispartof><rights>Copyright Royal Society of Chemistry 2019</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c378t-92bd39d3f90b822600bcdd4a6f1de763877e516d969f0c0e403efe50399863583</citedby><cites>FETCH-LOGICAL-c378t-92bd39d3f90b822600bcdd4a6f1de763877e516d969f0c0e403efe50399863583</cites><orcidid>0000-0002-3144-9877 ; 0000-0001-8062-9658</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/30849143$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Lee, Loong Chuen</creatorcontrib><creatorcontrib>Jemain, Abdul Aziz</creatorcontrib><title>Predictive modelling of colossal ATR-FTIR spectral data using PLS-DA: empirical differences between PLS1-DA and PLS2-DA algorithms</title><title>Analyst (London)</title><addtitle>Analyst</addtitle><description>In response to our review paper [L. C. Lee
et al.
,
Analyst
, 2018,
143
, 3526-3539], we present a study that compares empirical differences between PLS1-DA and PLS2-DA algorithms in modelling a colossal ATR-FTIR spectral dataset. Over the past two decades, partial least squares-discriminant analysis (PLS-DA) has gained wide acceptance and huge popularity in the field of applied research, partly due to its dimensionality reduction capability and ability to handle multicollinear and correlated variables. To solve a
K
-class problem (
K
> 2) using PLS-DA and high-dimensional data like infrared spectra, one can construct either
K
one-
versus
-all PLS1-DA models or only one PLS2-DA model. The aim of this work is to explore empirical differences between the two PLS-DA algorithms in modeling a colossal ATR-FTIR spectral dataset. The practical task is to build a prediction model using the imbalanced, high dimensional, colossal and multi-class ATR-FTIR spectra of blue gel pen inks. Four different sub-datasets were prepared from the principal dataset by considering the raw and asymmetric least squares (AsLS) preprocessed forms: (a) Raw-global region; (b) Raw-local region; (c) AsLS-global region; and (d) AsLS-local region. A series of 50 models which includes the first 50 PLS components incrementally was constructed repeatedly using the four sub-datasets. Each model was evaluated using six different variants of
v
-fold cross validation, autoprediction and external testing methods. As a result, each PLS-DA algorithm was represented by a number of figures of merit. The differences between PLS1-DA and PLS2-DA algorithms were assessed using hypothesis tests with respect to model accuracy, stability and fitting. On the other hand, confusion matrices of the two PLS-DA algorithms were inspected carefully for assessment of model parsimony. Overall, both the algorithms presented satisfactory model accuracy and stability. Nonetheless, PLS1-DA models showed significantly higher accuracy rates than PLS2-DA models, whereas PLS2-DA models seem to be much more stable compared to PLS1-DA models. Eventually, PLS2-DA also proved to be less prone to overfitting and is more parsimonious than PLS1-DA. In conclusion, the relatively high accuracy of the PLS1-DA algorithm is achieved at the cost of rather low parsimony and stability, and with an increased risk of overfitting.
In response to our review paper [L. C. Lee
et al.
,
Analyst
, 2018,
143
, 3526-3539], we present a study that compares empirical differences between PLS1-DA and PLS2-DA algorithms in modelling a colossal ATR-FTIR spectral dataset.</description><subject>Accuracy</subject><subject>Algorithms</subject><subject>Datasets</subject><subject>Discriminant analysis</subject><subject>Empirical analysis</subject><subject>Infrared spectra</subject><subject>Inks</subject><subject>Least squares</subject><subject>Model accuracy</subject><subject>Modelling</subject><subject>Prediction models</subject><subject>Stability analysis</subject><subject>Test procedures</subject><issn>0003-2654</issn><issn>1364-5528</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><recordid>eNpd0c9rFDEUB_Agil2rF-9KwIsIoy_JZCbxtmxbLSxa6noeMslLTZkf22RG8epf3ky3VvCUH-_DI3lfQl4yeM9A6A9WmQE41KV7RFZMVGUhJVePyQoARMErWR6RZyld5yMDCU_JkQBValaKFflzEdEFO4WfSPvRYdeF4YqOntqxG1MyHV3vLouz3fklTXu0U8w3zkyGzmmBF9tvxcn6I8V-H2KwSzF4jxEHi4m2OP1CHBbFMqNmcMue3-27qzGG6UefnpMn3nQJX9yvx-T72elu87nYfv10vllvCytqNRWat05oJ7yGVnFeAbTWudJUnjmsK6HqGiWrnK60BwtYgkCPMg9Iq0pIJY7J20PffRxvZkxT04dk84_NgOOcGs6UlhKgZpm--Y9ej3Mc8usazqFUDHhdZ_XuoGzMo4rom30MvYm_GwbNkkyzUesvd8mcZPz6vuXc9uge6N8oMnh1ADHZh-q_aMUtmPeP7A</recordid><startdate>20190408</startdate><enddate>20190408</enddate><creator>Lee, Loong Chuen</creator><creator>Jemain, Abdul Aziz</creator><general>Royal Society of Chemistry</general><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SR</scope><scope>7U5</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>L7M</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0002-3144-9877</orcidid><orcidid>https://orcid.org/0000-0001-8062-9658</orcidid></search><sort><creationdate>20190408</creationdate><title>Predictive modelling of colossal ATR-FTIR spectral data using PLS-DA: empirical differences between PLS1-DA and PLS2-DA algorithms</title><author>Lee, Loong Chuen ; Jemain, Abdul Aziz</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c378t-92bd39d3f90b822600bcdd4a6f1de763877e516d969f0c0e403efe50399863583</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Accuracy</topic><topic>Algorithms</topic><topic>Datasets</topic><topic>Discriminant analysis</topic><topic>Empirical analysis</topic><topic>Infrared spectra</topic><topic>Inks</topic><topic>Least squares</topic><topic>Model accuracy</topic><topic>Modelling</topic><topic>Prediction models</topic><topic>Stability analysis</topic><topic>Test procedures</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Lee, Loong Chuen</creatorcontrib><creatorcontrib>Jemain, Abdul Aziz</creatorcontrib><collection>PubMed</collection><collection>CrossRef</collection><collection>Engineered Materials Abstracts</collection><collection>Solid State and Superconductivity Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>MEDLINE - Academic</collection><jtitle>Analyst (London)</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Lee, Loong Chuen</au><au>Jemain, Abdul Aziz</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Predictive modelling of colossal ATR-FTIR spectral data using PLS-DA: empirical differences between PLS1-DA and PLS2-DA algorithms</atitle><jtitle>Analyst (London)</jtitle><addtitle>Analyst</addtitle><date>2019-04-08</date><risdate>2019</risdate><volume>144</volume><issue>8</issue><spage>267</spage><epage>2678</epage><pages>267-2678</pages><issn>0003-2654</issn><eissn>1364-5528</eissn><abstract>In response to our review paper [L. C. Lee
et al.
,
Analyst
, 2018,
143
, 3526-3539], we present a study that compares empirical differences between PLS1-DA and PLS2-DA algorithms in modelling a colossal ATR-FTIR spectral dataset. Over the past two decades, partial least squares-discriminant analysis (PLS-DA) has gained wide acceptance and huge popularity in the field of applied research, partly due to its dimensionality reduction capability and ability to handle multicollinear and correlated variables. To solve a
K
-class problem (
K
> 2) using PLS-DA and high-dimensional data like infrared spectra, one can construct either
K
one-
versus
-all PLS1-DA models or only one PLS2-DA model. The aim of this work is to explore empirical differences between the two PLS-DA algorithms in modeling a colossal ATR-FTIR spectral dataset. The practical task is to build a prediction model using the imbalanced, high dimensional, colossal and multi-class ATR-FTIR spectra of blue gel pen inks. Four different sub-datasets were prepared from the principal dataset by considering the raw and asymmetric least squares (AsLS) preprocessed forms: (a) Raw-global region; (b) Raw-local region; (c) AsLS-global region; and (d) AsLS-local region. A series of 50 models which includes the first 50 PLS components incrementally was constructed repeatedly using the four sub-datasets. Each model was evaluated using six different variants of
v
-fold cross validation, autoprediction and external testing methods. As a result, each PLS-DA algorithm was represented by a number of figures of merit. The differences between PLS1-DA and PLS2-DA algorithms were assessed using hypothesis tests with respect to model accuracy, stability and fitting. On the other hand, confusion matrices of the two PLS-DA algorithms were inspected carefully for assessment of model parsimony. Overall, both the algorithms presented satisfactory model accuracy and stability. Nonetheless, PLS1-DA models showed significantly higher accuracy rates than PLS2-DA models, whereas PLS2-DA models seem to be much more stable compared to PLS1-DA models. Eventually, PLS2-DA also proved to be less prone to overfitting and is more parsimonious than PLS1-DA. In conclusion, the relatively high accuracy of the PLS1-DA algorithm is achieved at the cost of rather low parsimony and stability, and with an increased risk of overfitting.
In response to our review paper [L. C. Lee
et al.
,
Analyst
, 2018,
143
, 3526-3539], we present a study that compares empirical differences between PLS1-DA and PLS2-DA algorithms in modelling a colossal ATR-FTIR spectral dataset.</abstract><cop>England</cop><pub>Royal Society of Chemistry</pub><pmid>30849143</pmid><doi>10.1039/c8an02074d</doi><tpages>9</tpages><orcidid>https://orcid.org/0000-0002-3144-9877</orcidid><orcidid>https://orcid.org/0000-0001-8062-9658</orcidid></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0003-2654 |
ispartof | Analyst (London), 2019-04, Vol.144 (8), p.267-2678 |
issn | 0003-2654 1364-5528 |
language | eng |
recordid | cdi_pubmed_primary_30849143 |
source | Royal Society of Chemistry:Jisc Collections:Royal Society of Chemistry Read and Publish 2022-2024 (reading list) |
subjects | Accuracy Algorithms Datasets Discriminant analysis Empirical analysis Infrared spectra Inks Least squares Model accuracy Modelling Prediction models Stability analysis Test procedures |
title | Predictive modelling of colossal ATR-FTIR spectral data using PLS-DA: empirical differences between PLS1-DA and PLS2-DA algorithms |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-29T02%3A57%3A48IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Predictive%20modelling%20of%20colossal%20ATR-FTIR%20spectral%20data%20using%20PLS-DA:%20empirical%20differences%20between%20PLS1-DA%20and%20PLS2-DA%20algorithms&rft.jtitle=Analyst%20(London)&rft.au=Lee,%20Loong%20Chuen&rft.date=2019-04-08&rft.volume=144&rft.issue=8&rft.spage=267&rft.epage=2678&rft.pages=267-2678&rft.issn=0003-2654&rft.eissn=1364-5528&rft_id=info:doi/10.1039/c8an02074d&rft_dat=%3Cproquest_pubme%3E2189550071%3C/proquest_pubme%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c378t-92bd39d3f90b822600bcdd4a6f1de763877e516d969f0c0e403efe50399863583%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2204810277&rft_id=info:pmid/30849143&rfr_iscdi=true |