Loading…

Deep Learning-Based Algorithm for Classification of News Text

As online news grows exponentially, hotspot classification is becoming increasingly important. Although traditional machine learning-based text classification methods, such as plain Bayes, support vector machines (SVMs), and classification trees, provide a certain degree of interpretability, they ar...

Full description

Saved in:
Bibliographic Details
Published in:IEEE access 2024, Vol.12, p.159086-159098
Main Authors: Yu Li, Xiao, Han, Ling Bo, Feng Jiang, Zheng
Format: Article
Language:English
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:As online news grows exponentially, hotspot classification is becoming increasingly important. Although traditional machine learning-based text classification methods, such as plain Bayes, support vector machines (SVMs), and classification trees, provide a certain degree of interpretability, they are often incapable of handling complex semantic relations, which frequently results in poor classification accuracy. To address this issue, in this study we introduce a novel deep learning-based text classification (TC) method based on a convolutional neural network (CNN), long short-term memory (LSTM), and an attention mechanism. This method accurately predicts news popularity by combining the feature extraction ability of the CNN, the sequence modeling ability of LSTM, and the weighted summation ability of the attention mechanism. The experimental results demonstrated that, compared to other deep learning models, the proposed method achieved a higher accuracy, more effectively accounted for the context of the text data, and addressed the problem of poor classification accuracy. Along with model selection, feature engineering was also the key to improving the accuracy. Accordingly, we developed a plain Bayesian TC model based on feature extraction, using word embeddings to convert text into richer vector representations. Then we combined the model with the different plain Bayes distributions, proving that the polynomial plain Bayes was the most suitable model for TC. Consequently, we added the feature word classification expressiveness index to improve the term frequency-inverse document frequency (TF-IDF) feature extraction, which produced a classification accuracy of 96%. This demonstrates that the improved model is superior at understanding and classifying text.
ISSN:2169-3536
2169-3536
DOI:10.1109/ACCESS.2024.3487311