Loading…

MDLText: An efficient and lightweight text classifier

•A novel multinomial text classification method based on the minimum description length principle is proposed.•The proposed approach is efficient, lightweight, scalable, multiclass, and sufficiently robust to prevent overfitting.•Experiments were performed using forty-five text corpora, in batch lea...

Full description

Saved in:
Bibliographic Details
Published in:Knowledge-based systems 2017-02, Vol.118, p.152-164
Main Authors: Silva, Renato M., Almeida, Tiago A., Yamakami, Akebo
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:•A novel multinomial text classification method based on the minimum description length principle is proposed.•The proposed approach is efficient, lightweight, scalable, multiclass, and sufficiently robust to prevent overfitting.•Experiments were performed using forty-five text corpora, in batch learning and online learning learning contexts.•The results indicate that our proposed approach outperformed the most-known benchmark text classification techniques. In many areas, the volume of text information is increasing rapidly, thereby demanding efficient text classification approaches. Several methods are available at present, but most exhibit declining performance as the dimensionality of the problem increases, or they incur high computational costs for training, which limit their application in real scenarios. Thus, it is necessary to develop a method that can process high dimensional data in a rapid manner. In this study, we propose the MDLText, an efficient, lightweight, scalable, and fast multinomial text classifier, which is based on the minimum description length principle. MDLText exhibits fast incremental learning as well as being sufficiently robust to prevent overfitting, which are desirable features in real-world applications, large-scale problems, and online scenarios. Our experiments were carefully designed to ensure that we obtained statistically sound results, which demonstrated that the proposed approach achieves a good balance between predictive power and computational efficiency.
ISSN:0950-7051
1872-7409
DOI:10.1016/j.knosys.2016.11.018