Loading…

Chinese text classification method based on sentence information enhancement and feature fusion

Text classification involves annotating text data with specific labels and is a crucial research task in the field of natural language processing. Chinese text classification presents significant challenges due to the complex semantics of the language, difficulties in semantic feature extraction, an...

Full description

Saved in:
Bibliographic Details
Published in:Heliyon 2024-09, Vol.10 (17), p.e36861, Article e36861
Main Authors: Zhu, Binglin, Pan, Wei
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Text classification involves annotating text data with specific labels and is a crucial research task in the field of natural language processing. Chinese text classification presents significant challenges due to the complex semantics of the language, difficulties in semantic feature extraction, and the interleaving and irregularity of lexical features. Traditional methods often struggle to manage the relationships between words and sentences in Chinese, hindering the model's ability to capture deep semantic information and resulting in poor classification performance. To address these issues, a Chinese text classification method based on utterance information enhancement and feature fusion is proposed. This method first embeds the text into a unified space and obtains feature representations of word vectors and sentence vectors using the BERT (Bidirectional Encoder Representations from Transformers) pre-trained language model. Subsequently, an utterance information enhancement module is constructed to perform syntactic enhancement and feature extraction on the sentence information within the text. Additionally, a feature fusion strategy is introduced to combine the enhanced sentence-level information features with the word-level features extracted by the Bi-GRU (Bidirectional Gated Recurrent Unit network), culminating in the classification output. This approach effectively enhances the feature representation of Chinese text and significantly filters out irrelevant and noisy information. Evaluations on several Chinese datasets demonstrate that the proposed method surpasses existing mainstream classification models in terms of classification accuracy and F1 value, validating its effectiveness and feasibility.
ISSN:2405-8440
2405-8440
DOI:10.1016/j.heliyon.2024.e36861