Loading…
Use of t‐distributed stochastic neighbour embedding in vibrational spectroscopy
The t‐distributed stochastic neighbour embedding algorithm or t‐SNE is a non‐linear dimension reduction method used to visualise multivariate data. It enables a high‐dimensional dataset, such as a set of infrared spectra, to be represented on a single, typically two‐dimensional graph, revealing its...
Saved in:
Published in: | Journal of chemometrics 2024-04, Vol.38 (4), p.n/a |
---|---|
Main Authors: | , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | The t‐distributed stochastic neighbour embedding algorithm or t‐SNE is a non‐linear dimension reduction method used to visualise multivariate data. It enables a high‐dimensional dataset, such as a set of infrared spectra, to be represented on a single, typically two‐dimensional graph, revealing its global and local structure. t‐SNE is very popular in the machine learning community and has been applied in many fields, generally with the aim of visualising large datasets. In vibrational spectroscopy, t‐SNE is gaining notoriety but principal component analysis (PCA) remains by far the reference method for exploratory analysis and dimension reduction. However, t‐SNE may represent a real aid in the analysis of vibrational spectroscopic datasets. It provides an at‐a‐glance global view of the dataset allowing to distinguish the main factors influencing the spectral signal and the hierarchy between these factors, and gives an indication on the possibility of performing predictive modelling. It can also provide great support in the choice of the pre‐processing, by comparing rapidly different general pre‐processing approaches according to their effect on the variable of interest. Here we propose to illustrate these advantages using different datasets. We also propose an approach based on a synergy between the t‐SNE and PCA methods, allowing respective advantages of each to be exploited.
The t‐distributed stochastic neighbour embedding algorithm, or t‐SNE, is a nonlinear method enabling the visualisation of multivariate datasets in lower dimensions. In vibrational spectroscopy, t‐SNE can provide a rapid global overview of a dataset and its influencing factors, assess the potential for predictive modelling and aid with the selection of the pre‐processing workflow. These advantages are illustrated with two real vibrational spectroscopic datasets. A data exploration approach based on the synergy between t‐SNE and PCA is also proposed. |
---|---|
ISSN: | 0886-9383 1099-128X |
DOI: | 10.1002/cem.3544 |