Loading…

Navigating Tabular Data Synthesis Research Understanding User Needs and Tool Capabilities

In an era of rapidly advancing data-driven applications, there is a growing demand for data in both research and practice. Synthetic data have emerged as an alternative when there are not enough real data available or when these data may not be shared (e.g., due to privacy regulations). Synthesizing...

Full description

Saved in:
Bibliographic Details
Published in:SIGMOD record 2025-01, Vol.53 (4), p.18-35
Main Authors: Davila R., Maria F., Groen, Sven, Panse, Fabian, Wingerath, Wolfram
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In an era of rapidly advancing data-driven applications, there is a growing demand for data in both research and practice. Synthetic data have emerged as an alternative when there are not enough real data available or when these data may not be shared (e.g., due to privacy regulations). Synthesizing tabular data presents unique and complex challenges, especially handling (i) missing values, (ii) dataset imbalance, (iii) diverse column types, and (iv) complex data distributions, as well as preserving (v) column correlations, (vi) temporal dependencies, and (vii) integrity constraints (e.g., functional dependencies) present in the original dataset. Although significant progress has been made recently in the development of generational models, there is no one-size-fitsall solution for tabular data today and choosing the right tool for a particular use case remains a difficult task. In this paper, we survey the state of the art in Tabular Data Synthesis (TDS) and examine user needs by defining a set of functional and non-functional requirements. We also evaluate the reported performance of 37 TDS research tools on these requirements, develop a tool selection guide to help users find a suitable TDS tool for their use case, and identify open challenges in TDS research, especially with respect to data management.
ISSN:0163-5808
DOI:10.1145/3712311.3712315