Loading…

Explainable hybrid tabular Variational Autoencoder and feature Tokenizer Transformer for depression prediction

Recent advancements in machine learning and deep learning have significantly improved the diagnosis, detection, prediction, and prognosis of depressive disorders. However, these methodologies have issues related to generalizability, transparency, data scarcity, privacy concerns, and class imbalance....

Full description

Saved in:
Bibliographic Details
Published in:Expert systems with applications 2025-03, Vol.265, p.126084, Article 126084
Main Authors: Quang Tran, Vinh, Byeon, Haewon
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Recent advancements in machine learning and deep learning have significantly improved the diagnosis, detection, prediction, and prognosis of depressive disorders. However, these methodologies have issues related to generalizability, transparency, data scarcity, privacy concerns, and class imbalance. This study develops a robust and interpretable model for predicting depression in South Korea, addressing these limitations. We employed a hybrid deep learning approach that integrates the Feature Tokenizer Transformer model, specifically designed for tabular data, to effectively tokenize and process categorical and numerical features, alongside synthetic data generated by the tabular variational autoencoder (TVAE). TVAE is an adaptation of variational autoencoders that uses a specialized loss function for tabular data. generated high-quality synthetic data from the Korea National Health and Nutrition Examination Survey (KNHANES) dataset. The efficacy of the TVAE-generated data was validated using non-parametric statistical tests, achieving 86.30% on the Kolmogorov-Smirnov test and 76.65% on the Chi-squared test. Performance evaluation metrics, such as accuracy, recall, F1-score, and AUC, demonstrated our model’s effectiveness, yielding an accuracy 0.7783, a recall score of 0.5310, an F1-score of 0.4657, and an AUC of 0.6822, outperforming state-of-the-art models. Additionally, SHapley Additive exPlanations analysis was incorporated to explain feature importance, offering valuable insights for healthcare professionals. This research highlights the potential of deep learning and synthetic data generation techniques to enhance depression prediction, addressing critical challenges such as generalizability, class imbalance, data privacy, and interpretability over existing models.
ISSN:0957-4174
DOI:10.1016/j.eswa.2024.126084