Loading…
Deep Learning-Based Algorithm for Classification of News Text
As online news grows exponentially, hotspot classification is becoming increasingly important. Although traditional machine learning-based text classification methods, such as plain Bayes, support vector machines (SVMs), and classification trees, provide a certain degree of interpretability, they ar...
Saved in:
Published in: | IEEE access 2024, Vol.12, p.159086-159098 |
---|---|
Main Authors: | , , |
Format: | Article |
Language: | English |
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | As online news grows exponentially, hotspot classification is becoming increasingly important. Although traditional machine learning-based text classification methods, such as plain Bayes, support vector machines (SVMs), and classification trees, provide a certain degree of interpretability, they are often incapable of handling complex semantic relations, which frequently results in poor classification accuracy. To address this issue, in this study we introduce a novel deep learning-based text classification (TC) method based on a convolutional neural network (CNN), long short-term memory (LSTM), and an attention mechanism. This method accurately predicts news popularity by combining the feature extraction ability of the CNN, the sequence modeling ability of LSTM, and the weighted summation ability of the attention mechanism. The experimental results demonstrated that, compared to other deep learning models, the proposed method achieved a higher accuracy, more effectively accounted for the context of the text data, and addressed the problem of poor classification accuracy. Along with model selection, feature engineering was also the key to improving the accuracy. Accordingly, we developed a plain Bayesian TC model based on feature extraction, using word embeddings to convert text into richer vector representations. Then we combined the model with the different plain Bayes distributions, proving that the polynomial plain Bayes was the most suitable model for TC. Consequently, we added the feature word classification expressiveness index to improve the term frequency-inverse document frequency (TF-IDF) feature extraction, which produced a classification accuracy of 96%. This demonstrates that the improved model is superior at understanding and classifying text. |
---|---|
ISSN: | 2169-3536 2169-3536 |
DOI: | 10.1109/ACCESS.2024.3487311 |