Loading…
Interpretable speech features vs. DNN embeddings: What to use in the automatic assessment of Parkinson’s disease in multi-lingual scenarios
Speech-based approaches for assessing Parkinson’s Disease (PD) often rely on feature extraction for automatic classification or detection. While many studies prioritize accuracy by using non-interpretable embeddings from Deep Neural Networks, this work aims to explore the predictive capabilities and...
Saved in:
Published in: | Computers in biology and medicine 2023-11, Vol.166, p.107559, Article 107559 |
---|---|
Main Authors: | , , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Speech-based approaches for assessing Parkinson’s Disease (PD) often rely on feature extraction for automatic classification or detection. While many studies prioritize accuracy by using non-interpretable embeddings from Deep Neural Networks, this work aims to explore the predictive capabilities and language robustness of both feature types in a systematic fashion. As interpretable features, prosodic, linguistic, and cognitive descriptors were adopted, while x-vectors, Wav2Vec 2.0, HuBERT, and TRILLsson representations were used as non-interpretable features. Mono-lingual, multi-lingual, and cross-lingual machine learning experiments were conducted leveraging six data sets comprising speech recordings from various languages: American English, Castilian Spanish, Colombian Spanish, Italian, German, and Czech. For interpretable feature-based models, the mean of the best F1-scores obtained from each language was 81% in mono-lingual, 81% in multi-lingual, and 71% in cross-lingual experiments. For non-interpretable feature-based models, instead, they were 85% in mono-lingual, 88% in multi-lingual, and 79% in cross-lingual experiments. Firstly, models based on non-interpretable features outperformed interpretable ones, especially in cross-lingual experiments. Specifically, TRILLsson provided the most stable and accurate results across tasks and data sets. Conversely, the two types of features adopted showed some level of language robustness in multi-lingual and cross-lingual experiments. Overall, these results suggest that interpretable feature-based models can be used by clinicians to evaluate the deterioration of the speech of patients with PD, while non-interpretable feature-based models can be leveraged to achieve higher detection accuracy.
•Both interpretable and non-interpretable features displayed robust behaviors.•Models based on non-interpretable features outperformed interpretable ones.•Interpretable feature-based models provide insights into speech deterioration.•Non-interpretable feature-based models can achieve higher detection accuracy. |
---|---|
ISSN: | 0010-4825 1879-0534 1879-0534 |
DOI: | 10.1016/j.compbiomed.2023.107559 |