Loading…
Navigating Tabular Data Synthesis Research Understanding User Needs and Tool Capabilities
In an era of rapidly advancing data-driven applications, there is a growing demand for data in both research and practice. Synthetic data have emerged as an alternative when there are not enough real data available or when these data may not be shared (e.g., due to privacy regulations). Synthesizing...
Saved in:
Published in: | SIGMOD record 2025-01, Vol.53 (4), p.18-35 |
---|---|
Main Authors: | , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | In an era of rapidly advancing data-driven applications, there is a growing demand for data in both research and practice. Synthetic data have emerged as an alternative when there are not enough real data available or when these data may not be shared (e.g., due to privacy regulations). Synthesizing tabular data presents unique and complex challenges, especially handling (i) missing values, (ii) dataset imbalance, (iii) diverse column types, and (iv) complex data distributions, as well as preserving (v) column correlations, (vi) temporal dependencies, and (vii) integrity constraints (e.g., functional dependencies) present in the original dataset. Although significant progress has been made recently in the development of generational models, there is no one-size-fitsall solution for tabular data today and choosing the right tool for a particular use case remains a difficult task. In this paper, we survey the state of the art in Tabular Data Synthesis (TDS) and examine user needs by defining a set of functional and non-functional requirements. We also evaluate the reported performance of 37 TDS research tools on these requirements, develop a tool selection guide to help users find a suitable TDS tool for their use case, and identify open challenges in TDS research, especially with respect to data management. |
---|---|
ISSN: | 0163-5808 |
DOI: | 10.1145/3712311.3712315 |