Loading…
Explainable hybrid tabular Variational Autoencoder and feature Tokenizer Transformer for depression prediction
Recent advancements in machine learning and deep learning have significantly improved the diagnosis, detection, prediction, and prognosis of depressive disorders. However, these methodologies have issues related to generalizability, transparency, data scarcity, privacy concerns, and class imbalance....
Saved in:
Published in: | Expert systems with applications 2025-03, Vol.265, p.126084, Article 126084 |
---|---|
Main Authors: | , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Recent advancements in machine learning and deep learning have significantly improved the diagnosis, detection, prediction, and prognosis of depressive disorders. However, these methodologies have issues related to generalizability, transparency, data scarcity, privacy concerns, and class imbalance. This study develops a robust and interpretable model for predicting depression in South Korea, addressing these limitations. We employed a hybrid deep learning approach that integrates the Feature Tokenizer Transformer model, specifically designed for tabular data, to effectively tokenize and process categorical and numerical features, alongside synthetic data generated by the tabular variational autoencoder (TVAE). TVAE is an adaptation of variational autoencoders that uses a specialized loss function for tabular data. generated high-quality synthetic data from the Korea National Health and Nutrition Examination Survey (KNHANES) dataset. The efficacy of the TVAE-generated data was validated using non-parametric statistical tests, achieving 86.30% on the Kolmogorov-Smirnov test and 76.65% on the Chi-squared test. Performance evaluation metrics, such as accuracy, recall, F1-score, and AUC, demonstrated our model’s effectiveness, yielding an accuracy 0.7783, a recall score of 0.5310, an F1-score of 0.4657, and an AUC of 0.6822, outperforming state-of-the-art models. Additionally, SHapley Additive exPlanations analysis was incorporated to explain feature importance, offering valuable insights for healthcare professionals. This research highlights the potential of deep learning and synthetic data generation techniques to enhance depression prediction, addressing critical challenges such as generalizability, class imbalance, data privacy, and interpretability over existing models. |
---|---|
ISSN: | 0957-4174 |
DOI: | 10.1016/j.eswa.2024.126084 |