Loading…

Model‐based clustering of high‐dimensional longitudinal data via regularization

We propose a model‐based clustering method for high‐dimensional longitudinal data via regularization in this paper. This study was motivated by the Trial of Activity in Adolescent Girls (TAAG), which aimed to examine multilevel factors related to the change of physical activity by following up a coh...

Full description

Saved in:
Bibliographic Details
Published in:Biometrics 2023-06, Vol.79 (2), p.761-774
Main Authors: Yang, Luoying, Wu, Tong Tong
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:We propose a model‐based clustering method for high‐dimensional longitudinal data via regularization in this paper. This study was motivated by the Trial of Activity in Adolescent Girls (TAAG), which aimed to examine multilevel factors related to the change of physical activity by following up a cohort of 783 girls over 10 years from adolescence to early adulthood. Our goal is to identify the intrinsic grouping of subjects with similar patterns of physical activity trajectories and the most relevant predictors within each group. The previous analyses conducted clustering and variable selection in two steps, while our new method can perform the tasks simultaneously. Within each cluster, a linear mixed‐effects model (LMM) is fitted with a doubly penalized likelihood to induce sparsity for parameter estimation and effect selection. The large‐sample joint properties are established, allowing the dimensions of both fixed and random effects to increase at an exponential rate of the sample size, with a general class of penalty functions. Assuming subjects are drawn from a Gaussian mixture distribution, model effects and cluster labels are estimated via a coordinate descent algorithm nested inside the Expectation‐Maximization (EM) algorithm. Bayesian Information Criterion (BIC) is used to determine the optimal number of clusters and the values of tuning parameters. Our numerical studies show that the new method has satisfactory performance and is able to accommodate complex data with multilevel and/or longitudinal effects.
ISSN:0006-341X
1541-0420
DOI:10.1111/biom.13672