Loading…

Mixture model based multivariate statistical analysis of multiply censored environmental data

•EM method for multiply censored multivariate data is proposed.•GMM is used to approximate the underlying distribution of environmental data.•The proposed method is capable of dealing with both censored and uncensored data.•The method is successfully applied for analysis of real water quality data....

Full description

Saved in:
Bibliographic Details
Published in:Advances in water resources 2013-09, Vol.59, p.15-24
Main Author: He, Jianxun
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:•EM method for multiply censored multivariate data is proposed.•GMM is used to approximate the underlying distribution of environmental data.•The proposed method is capable of dealing with both censored and uncensored data.•The method is successfully applied for analysis of real water quality data. Environmental data are commonly constrained by a detection limit (DL) because of the restriction of experimental apparatus. In particular due to the changes of experimental units or assay methods, the observed data are often cut off by more than one DL. Measurements below the DLs are typically replaced by an arbitrary value such as zeros, half of DLs, or DLs for convenience of analysis. However, this method is widely considered unreliable and prone to bias. In contrast, maximum likelihood estimation (MLE) method for censored data has been developed for better performance and statistical justification. However, the existing MLE methods seldom address the multivariate context of censored environmental data especially for water quality. This paper proposes using a mixture model to flexibly approximate the underlying distribution of the observed data due to its good approximation capability and generation mechanism. In particular, Gaussian mixture model (GMM) is mainly focused in this study. To cope with the censored data with multiple DLs, an expectation–maximization (EM) algorithm in a multivariate setting is developed. The proposed statistical analysis approach is verified from both the simulated data and real water quality data.
ISSN:0309-1708
1872-9657
DOI:10.1016/j.advwatres.2013.05.001