Loading…
A computational analysis of transcribed speech of people living with dementia: The Anchise 2022 Corpus
•The variety of methods adopted to perform the automatic analysis, ranging from traditional morphosyntactic analysis based on statistical methods, transformers-based language models, sentiment and emotions analysis, and perplexity metrics.•The types of information which is automatically retrieved fr...
Saved in:
Published in: | Computer speech & language 2025-01, Vol.89, p.101691, Article 101691 |
---|---|
Main Authors: | , , , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | •The variety of methods adopted to perform the automatic analysis, ranging from traditional morphosyntactic analysis based on statistical methods, transformers-based language models, sentiment and emotions analysis, and perplexity metrics.•The types of information which is automatically retrieved from the dialogue transcripts that regards lexical and morphosyntactic choices as well as speaker's emotions.•The analysis of highly ecological and large corpus, that is the Anchise Corpus.
Automatic linguistic analysis can provide cost-effective, valuable clues to the diagnosis of cognitive difficulties and to therapeutic practice, and hence impact positively on wellbeing. In this work, we analyzed transcribed conversations between elderly individuals living with dementia and healthcare professionals. The material came from the Anchise 2022 Corpus, a large collection of transcripts of conversations in Italian recorded in naturalistic conditions. The aim of the work was to test the effectiveness of a number of automatic analyzes in finding correlations with the progression of dementia in individuals with cognitive decline as measured by the Mini-Mental State Examination (MMSE) score, which is the only psychometric-clinical information available on the participants in the conversations. Healthy controls (HC) were not considered in this study, nor does the corpus itself include HCs. The main innovation and strength of the work consists in the high ecological validity of the language analyzed (most of the literature to date concerns controlled language experiments); in the use of Italian (there is little corpora for Italian); in the size of the analyzed data (more than 200 conversations were considered); in the adoption of a wide range of NLP methods, that span from traditional morphosyntactic investigation to deep linguistic models for conducting analyzes such as through perplexity, sentiment (polarity) and emotions.
Analyzing real-world interactions not designed with computational analysis in mind, such as is the case of the Anchise Corpus, is particularly challenging. To achieve the research goals, a wide variety of tools were employed. These included traditional morphosyntactic analysis based on digital linguistic biomarkers (DLBs), transformer-based language models, sentiment and emotion analysis, and perplexity metrics. Analyzes were conducted both on the continuous range of MMSE values and on the severe/moderate/mild categorization suggested by AIFA (Italian M |
---|---|
ISSN: | 0885-2308 |
DOI: | 10.1016/j.csl.2024.101691 |