Loading…

Combining binary classifiers in different dichotomy spaces for text categorization

Several supervised machine learning applications are commonly represented as multi-class problems, but it is harder to distinguish several classes rather than just two classes. In contrast to the approaches one-against-all and all-pairs that transform a multi-class problem into a set of binary probl...

Full description

Saved in:
Bibliographic Details
Published in:Applied soft computing 2019-03, Vol.76, p.564-574
Main Authors: Pinheiro, Roberto H.W., Cavalcanti, George D.C., Tsang, Ing Ren
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Several supervised machine learning applications are commonly represented as multi-class problems, but it is harder to distinguish several classes rather than just two classes. In contrast to the approaches one-against-all and all-pairs that transform a multi-class problem into a set of binary problems, Dichotomy Transformation (DT) converts a multi-class problem into a different problem where the goal is to verify if a pair of documents belongs to the same class or not. To perform this task, DT generates a dichotomy set obtained by combining a pair of documents, each belongs to either a positive class (documents in the pair that have the same class) or a negative class (documents in the pair that come from different classes). The definition of this dichotomy set plays an important role in the overall accuracy of the system. So, an alternative to avoid searching for the best dichotomy set is using multiple classifier systems because we can have many different sets where each one is used to train one binary classifier instead of having only one dichotomy set. Herein we propose Combined Dichotomy Transformations (CoDiT), a Text Categorization system that combines binary classifiers that are trained with different dichotomy sets using DT. By using DT, the number of training examples increases exponentially when compared with the original training set. This is a desirable property because each classifier can be trained with different data without reducing the number of examples or features. Therefore, it is possible to compose an ensemble with diverse and strong classifiers. Experiments using 14 databases show that CoDiT achieves statistically better results in comparison to SVM, Bagging, Random Subspace, BoosTexter, and Random Forest. [Display omitted] •Combined Dichotomy Transformations (CoDiT) method is proposed.•CoDiT is a multiple classifier system for text categorization.•CoDiT uses Dichotomy Transformation to transform a multi-class problem.•Each classifier in CoDiT is trained using a different transformed set.•CoDiT obtains better results than literature multi-classifier systems.
ISSN:1568-4946
1872-9681
DOI:10.1016/j.asoc.2018.12.023