Loading…

Mixing numerical and categorical data in a Self-Organizing Map by means of frequency neurons

[Display omitted] •Self-Organizing Maps (SOMs) are powerful tools with many applications. Nevertheless, they cannot deal directly with categorical variables.•In order to present categorical variables to SOMs, they are usually transformed by binarization. This increases dramatically the dataset dimen...

Full description

Saved in:
Bibliographic Details
Published in:Applied soft computing 2015-11, Vol.36, p.246-254
Main Authors: del Coso, Carmelo, Fustes, Diego, Dafonte, Carlos, Nóvoa, Francisco J., Rodríguez-Pedreira, José M., Arcay, Bernardino
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:[Display omitted] •Self-Organizing Maps (SOMs) are powerful tools with many applications. Nevertheless, they cannot deal directly with categorical variables.•In order to present categorical variables to SOMs, they are usually transformed by binarization. This increases dramatically the dataset dimensionality.•NCSOM has been presented in order to cope with categorical or mixed data. However, it presents some drawbacks: categorical and numerical variables are not equally balanced and the method is not convergent.•A novel SOM variant, called FMSOM, is presented which is able to deal with numerical and categorical variables, giving the same weight to them and ensuring convergence. A scalable implementation of the method is fully described.•FMSOM is applied to a benchmark of well known datasets, composed of categorical and mixed data. The results show the potential of the method to analyze this kind of datasets. Even though Self-Organizing Maps (SOMs) constitute a powerful and essential tool for pattern recognition and data mining, the common SOM algorithm is not apt for processing categorical data, which is present in many real datasets. It is for this reason that the categorical values are commonly converted into a binary code, a solution that unfortunately distorts the network training and the posterior analysis. The present work proposes a SOM architecture that directly processes the categorical values, without the need of any previous transformation. This architecture is also capable of properly mixing numerical and categorical data, in such a manner that all the features adopt the same weight. The proposed implementation is scalable and the corresponding learning algorithm is described in detail. Finally, we demonstrate the effectiveness of the presented algorithm by applying it to several well-known datasets.
ISSN:1568-4946
1872-9681
DOI:10.1016/j.asoc.2015.06.058