Loading…

A semi-supervised multi-label classification framework with feature reduction and enrichment

Multi-label classification (MLC) has drawn much attention, thanks to its usefulness and omnipresence in real-world applications in which objects may be characterized by more than one label as in the traditional approach. Getting multi-label examples is costly and time-consuming; therefore, semi-supe...

Full description

Saved in:
Bibliographic Details
Published in:Journal of information and telecommunication (Print) 2017-04, Vol.1 (2), p.141-154
Main Authors: Pham, Thi-Ngan, Nguyen, Van-Quang, Tran, Van-Hien, Nguyen, Tri-Thanh, Ha, Quang-Thuy
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Multi-label classification (MLC) has drawn much attention, thanks to its usefulness and omnipresence in real-world applications in which objects may be characterized by more than one label as in the traditional approach. Getting multi-label examples is costly and time-consuming; therefore, semi-supervised learning approach should be considered to take advantages of both labelled and unlabelled data. In this work, we propose a semi-supervised MLC algorithm exploiting the specific features of the prominent class label(s) chosen by a greedy approach as an extension of the LIFT algorithm, and unlabelled data consumption mechanism from the TExt classification using Semi-supervised Clustering algorithm. We also make a semi-supervised MLC application framework for Vietnamese texts with several feature enrichment steps including (a) a stage of enriching features by adding hidden topic features and (b) a stage of dimensional reduction for subtracting irrelevant features. Experimental results on a data set of hotel reviews (for tourism) indicate that a reasonable amount of unlabelled data helps to increase the F1 score. Interestingly, with a small amount of labelled data, our algorithm can reach a comparative performance to the case of using a larger amount of labelled data.
ISSN:2475-1839
2475-1847
DOI:10.1080/24751839.2017.1323486