Loading…

Learning a Dynamic-Based Representation for Multivariate Biomarker Time Series Classifications

Time series in healthcare practices and biomedical research are typically multivariate, i.e. multiple biomarkers are observed simultaneously at a time. However, they tend to be short, noisy, unaligned, irregularly sampled, partially observed, and with only limited samples. These imperfections pose a...

Full description

Saved in:
Bibliographic Details
Main Authors: Cao, Xi Hang, Han, Chao, Obradovic, Zoran
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Time series in healthcare practices and biomedical research are typically multivariate, i.e. multiple biomarkers are observed simultaneously at a time. However, they tend to be short, noisy, unaligned, irregularly sampled, partially observed, and with only limited samples. These imperfections pose a challenge for mining information from data. In this work, we propose to use dynamic-based representations to present such imperfect multivariate time series. Specifically, we propose an approach to learn a corresponding Linear Dynamical System (LDS) for a multivariate time series example and use the set of system parameters as a representation for that example. Such a representation is able to capture interactions of different variables and provide a unified view of multivariate time series with different lengths, different missingness mechanisms, and different starting points. Other techniques are then used to mine useful information and perform learning tasks based on the new representation. For example, we use support vector machine classification models with LDS kernels in time series classification tasks. To evaluate the effectiveness of the proposed approach, we conducted experiments on both synthetic data sets and real-life datasets. The results in synthetic datasets demonstrated that the proposed approach could correctly learn the similarities of underlying linear dynamical systems. Our real-life data sets included human influenza A (H3N2), Rhinovirus (HRV), and respiratory syncytial virus (RSV) gene expression time series. The accuracies in the leave-one-out symptomatic/asymptomatic diagnostic tasks showed that our approach outperformed three baseline algorithms. Moreover, in experiments where various levels of imperfections were imposed on the H3N2 dataset, the accuracies of other baseline methods degraded significantly, but the accuracy of our approach remained high.
ISSN:2575-2634
DOI:10.1109/ICHI.2018.00026