Loading…

Tensor product based 2-D correlation data preprocessing methods for Raman spectroscopy of Chinese handmade paper

[Display omitted] •The 2-D correlation methods do not require external perturbation variables.•They are pure mathematical methods that utilize the tensor product of spectral data.•The R2 values of KNN and RF for TDACM are close to 1, indicating nearly 100% improvement. The paper introduces two new m...

Full description

Saved in:

Bibliographic Details
Published in:	Spectrochimica acta. Part A, Molecular and biomolecular spectroscopy Molecular and biomolecular spectroscopy, 2023-12, Vol.302, p.123033, Article 123033
Main Authors:	Yan, Chunsheng, Luo, Si, Cao, Linquan, Cheng, Zhongyi, Zhang, Hui
Format:	Article
Language:	English
Subjects:	Data preprocessing Machine learning Raman spectroscopy Tensor product Two-dimensional correlation
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	[Display omitted] •The 2-D correlation methods do not require external perturbation variables.•They are pure mathematical methods that utilize the tensor product of spectral data.•The R2 values of KNN and RF for TDACM are close to 1, indicating nearly 100% improvement. The paper introduces two new methods, namely the cross correlation method (CCM) and two-dimensional correlation method (TDCM), for preprocessing Raman spectroscopy data for analyzing Chinese handmade paper samples. CCM expands the spectral dimension from 1×N to 1×2N-1 by taking cross-correlation between two spectral data of the same category. TDCM includes two-dimensional synchronous correlation method (TDSCM) and two-dimensional asynchronous correlation method (TDACM), which expand the spectral dimension from 1×N to N×N by taking tensor products between two spectral data and between one spectral data and the Hilbert transformation of the other spectral data of the same category, respectively. The experimental data were preprocessed using baseline removal, CCM, TDSCM, and TDACM methods. Four machine learning models were employed to evaluate the effects of these methods: principal component analysis (PCA) combined with linear regression (LR), support vector machine (SVM) combined with LR, k-Nearest Neighbors (KNN), and random forest (RF). The results show that the R-squared values for the PCA model were nearly 1 for all types of data, indicating high accuracy. However, for SVM-LR, KNN, and RF models, the R-squared values were sorted in the order of raw data, baseline removal data, CCM, TDSCM, and TDACM preprocessed data. The R-squared values of KNN and RF machine learning models for TDACM preprocessed data were approaching 1, indicating that the accuracy of machine learning was significantly improved by nearly 100%. This has led to a remarkable improvement in the accuracy of supervised models such as KNN and RF, bringing them closer to the level of unsupervised models such as PCA.
ISSN:	1386-1425
DOI:	10.1016/j.saa.2023.123033