Loading…
Heuristically repopulated Bayesian ant colony optimization for treating missing values in large databases
The incomplete datasets with missing values are unsuitable for making strategic decisions since they lead to biased results. This problem is even worse when the dataset is large and collected from many heterogeneous sources. The paper deals with missing scenarios which were not dealt together earlie...
Saved in:
Published in: | Knowledge-based systems 2017-10, Vol.133, p.107-121 |
---|---|
Main Authors: | , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | The incomplete datasets with missing values are unsuitable for making strategic decisions since they lead to biased results. This problem is even worse when the dataset is large and collected from many heterogeneous sources. The paper deals with missing scenarios which were not dealt together earlier. The proposed Dual Repopulated Bayesian Ant Colony Optimization (DPBACO) handles both ignorable and non-ignorable missing values in heterogeneous attributes of large datasets The DPBACO integrates Bayesian principles with Ant Colony Optimization technique since both are simple and efficient to implement. After pheromone updation, repopulation of the solution pool is done by dividing the population into two based on their fitness values and generating new offsprings by performing crossover operation. The DPBACO algorithm is implemented on six large mixed-attribute datasets for imputing both kinds of missing values. The empirical and statistical results show that DPBACO performs better than other existing methods at variable missing rates ranging from 5% to 50%. |
---|---|
ISSN: | 0950-7051 1872-7409 |
DOI: | 10.1016/j.knosys.2017.06.033 |