Loading…

A new Unsupervised Spectral Feature Selection Method for mixed data: A filter approach

•A new unsupervised filter feature selection method for mixed data is proposed.•Spectral feature selection is used for finding relevant features in mixed datasets.•The most relevant features are placed at the beginning of the ranking.•Our method overcomes state-of-the-art unsupervised filter feature...

Full description

Saved in:
Bibliographic Details
Published in:Pattern recognition 2017-12, Vol.72, p.314-326
Main Authors: Solorio-Fernández, Saúl, Martínez-Trinidad, José Fco, Carrasco-Ochoa, J. Ariel
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:•A new unsupervised filter feature selection method for mixed data is proposed.•Spectral feature selection is used for finding relevant features in mixed datasets.•The most relevant features are placed at the beginning of the ranking.•Our method overcomes state-of-the-art unsupervised filter feature selection methods. Most of the current unsupervised feature selection methods are designed to process only numerical datasets. Therefore, in practical problems, where the objects under study are described through both numerical and non-numerical features (mixed datasets), these methods cannot be directly applied. In this work, we propose a new unsupervised filter feature selection method that can be used on datasets with both numerical and non-numerical features. The proposed method is inspired by the spectral feature selection, by using together a kernel and a new spectrum based feature evaluation measure for quantifying the feature relevance. Experiments on synthetic datasets show that in the 99% of the cases where the relevant features are known our method identifies and ranks the most relevant features at the beginning of a sorted list. Additionally, we contrast our method against state-of-the-art unsupervised filter methods over real datasets, and our method in most cases significantly outperforms them.
ISSN:0031-3203
1873-5142
DOI:10.1016/j.patcog.2017.07.020