Loading…

Data visualization case studies for high‐dimensional data validation

Microsimulation and synthetic data are often high dimensional, requiring extensive validation and exploration to compare results against certain benchmarks. In both cases, validation is necessary to ensure that the many univariate distributions and multivariate relationships in the new data are simi...

Full description

Saved in:
Bibliographic Details
Published in:Stat (International Statistical Institute) 2021-12, Vol.10 (1), p.n/a
Main Author: Williams, Aaron R.
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Microsimulation and synthetic data are often high dimensional, requiring extensive validation and exploration to compare results against certain benchmarks. In both cases, validation is necessary to ensure that the many univariate distributions and multivariate relationships in the new data are similar to the many univariate distributions and multivariate relationships in the underlying data. This article illustrates some data visualization techniques for comparing a generated sample or population against a known reference sample or population. For implementation ease, we also outline an iterative workflow built with R Markdown that can be shared publicly on GitHub or privately with Amazon Web Services S3. The lessons learned from this work apply to any analysis that compares multiple data sets, deals with high‐dimensional data, or involves summarizing iterations of analyses.
ISSN:2049-1573
2049-1573
DOI:10.1002/sta4.334