Loading…

Validating functional redundancy with mixed generative adversarial networks

Data redundancy has been one of the most important problems in data-intensive applications such as data mining and machine learning. Removing data redundancy brings many benefits in efficient data updating, effective data storage, and error-free query processing. While it has been studied for four d...

Full description

Saved in:
Bibliographic Details
Published in:Knowledge-based systems 2023-03, Vol.264, p.110342, Article 110342
Main Authors: Nguyen, Thanh Tam, Huynh, Thanh Trung, Pham, Minh Tam, Hoang, Thanh Dat, Nguyen, Thanh Thi, Nguyen, Quoc Viet Hung
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Data redundancy has been one of the most important problems in data-intensive applications such as data mining and machine learning. Removing data redundancy brings many benefits in efficient data updating, effective data storage, and error-free query processing. While it has been studied for four decades, existing works on data redundancy mostly focus on syntactic formulations such as normal forms and functional dependencies, which lead to intractable discovery problems. In this work, we propose a new concept, namely functional redundancy, that overcomes the limitations of functional dependencies, especially on continuous data. We design and develop efficient algorithms based on generative adversarial networks to validate any functional redundancy without heavily depending on the number of attributes and the number of tuples like functional dependencies. The core idea is to use the imputation power of generative adversarial networks to model any semantic dependencies between attributes. Extensive experiments on different real-world and synthetic datasets show that our approach outperforms representative baselines, is applicable for first-order and high-order dependencies, and is extensible for different types of data. •We introduce a new concept of functional redundancy for data redundancy detection.•We tackle several issues of functional dependency, especially on continuous data.•Our imputation general adversarial network captures data redundancy.•Our approach works with several types of data, e.g. continuous and categorical data.•Our method outperforms baselines and is robust with different orders of redundancy.
ISSN:0950-7051
1872-7409
DOI:10.1016/j.knosys.2023.110342