Loading…
Mixture model based multivariate statistical analysis of multiply censored environmental data
•EM method for multiply censored multivariate data is proposed.•GMM is used to approximate the underlying distribution of environmental data.•The proposed method is capable of dealing with both censored and uncensored data.•The method is successfully applied for analysis of real water quality data....
Saved in:
Published in: | Advances in water resources 2013-09, Vol.59, p.15-24 |
---|---|
Main Author: | |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | •EM method for multiply censored multivariate data is proposed.•GMM is used to approximate the underlying distribution of environmental data.•The proposed method is capable of dealing with both censored and uncensored data.•The method is successfully applied for analysis of real water quality data.
Environmental data are commonly constrained by a detection limit (DL) because of the restriction of experimental apparatus. In particular due to the changes of experimental units or assay methods, the observed data are often cut off by more than one DL. Measurements below the DLs are typically replaced by an arbitrary value such as zeros, half of DLs, or DLs for convenience of analysis. However, this method is widely considered unreliable and prone to bias. In contrast, maximum likelihood estimation (MLE) method for censored data has been developed for better performance and statistical justification. However, the existing MLE methods seldom address the multivariate context of censored environmental data especially for water quality. This paper proposes using a mixture model to flexibly approximate the underlying distribution of the observed data due to its good approximation capability and generation mechanism. In particular, Gaussian mixture model (GMM) is mainly focused in this study. To cope with the censored data with multiple DLs, an expectation–maximization (EM) algorithm in a multivariate setting is developed. The proposed statistical analysis approach is verified from both the simulated data and real water quality data. |
---|---|
ISSN: | 0309-1708 1872-9657 |
DOI: | 10.1016/j.advwatres.2013.05.001 |