Loading…
Data visualization case studies for high‐dimensional data validation
Microsimulation and synthetic data are often high dimensional, requiring extensive validation and exploration to compare results against certain benchmarks. In both cases, validation is necessary to ensure that the many univariate distributions and multivariate relationships in the new data are simi...
Saved in:
Published in: | Stat (International Statistical Institute) 2021-12, Vol.10 (1), p.n/a |
---|---|
Main Author: | |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Microsimulation and synthetic data are often high dimensional, requiring extensive validation and exploration to compare results against certain benchmarks. In both cases, validation is necessary to ensure that the many univariate distributions and multivariate relationships in the new data are similar to the many univariate distributions and multivariate relationships in the underlying data. This article illustrates some data visualization techniques for comparing a generated sample or population against a known reference sample or population. For implementation ease, we also outline an iterative workflow built with R Markdown that can be shared publicly on GitHub or privately with Amazon Web Services S3. The lessons learned from this work apply to any analysis that compares multiple data sets, deals with high‐dimensional data, or involves summarizing iterations of analyses. |
---|---|
ISSN: | 2049-1573 2049-1573 |
DOI: | 10.1002/sta4.334 |