Loading…

Supervising the Decoder of Variational Autoencoders to Improve Scientific Utility

Probabilistic generative models are attractive for scientific modeling because their inferred parameters can be used to generate hypotheses and design experiments. This requires that the learned model provide an accurate representation of the input data and yield a latent space that effectively pred...

Full description

Saved in:
Bibliographic Details
Published in:arXiv.org 2022-07
Main Authors: Tu, Liyun, Talbot, Austin, Gallagher, Neil, Carlson, David
Format: Article
Language:English
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Probabilistic generative models are attractive for scientific modeling because their inferred parameters can be used to generate hypotheses and design experiments. This requires that the learned model provide an accurate representation of the input data and yield a latent space that effectively predicts outcomes relevant to the scientific question. Supervised Variational Autoencoders (SVAEs) have previously been used for this purpose, where a carefully designed decoder can be used as an interpretable generative model while the supervised objective ensures a predictive latent representation. Unfortunately, the supervised objective forces the encoder to learn a biased approximation to the generative posterior distribution, which renders the generative parameters unreliable when used in scientific models. This issue has remained undetected as reconstruction losses commonly used to evaluate model performance do not detect bias in the encoder. We address this previously-unreported issue by developing a second order supervision framework (SOS-VAE) that influences the decoder to induce a predictive latent representation. This ensures that the associated encoder maintains a reliable generative interpretation. We extend this technique to allow the user to trade-off some bias in the generative parameters for improved predictive performance, acting as an intermediate option between SVAEs and our new SOS-VAE. We also use this methodology to address missing data issues that often arise when combining recordings from multiple scientific experiments. We demonstrate the effectiveness of these developments using synthetic data and electrophysiological recordings with an emphasis on how our learned representations can be used to design scientific experiments.
ISSN:2331-8422
DOI:10.48550/arxiv.2109.04561