Loading…

Cigarette tasting Chinese text classification for low-resource scenarios

At present, the Chinese text field is facing challenges from low resource issues such as data scarcity and annotation difficulties. Moreover, in the domain of cigarette tasting, cigarette tasting texts tend to be colloquial, making it difficult to obtain valuable and high-quality tasting texts. Ther...

Full description

Saved in:
Bibliographic Details
Published in:Journal of intelligent & fuzzy systems 2024-03, p.1-15
Main Authors: Diao, Xiu-Li, Zhang, Hao-Ran, Zeng, Qing-Tian, Song, Zheng-Guo, Zhao, Hua
Format: Article
Language:English
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites cdi_FETCH-crossref_primary_10_3233_JIFS_2378163
container_end_page 15
container_issue
container_start_page 1
container_title Journal of intelligent & fuzzy systems
container_volume
creator Diao, Xiu-Li
Zhang, Hao-Ran
Zeng, Qing-Tian
Song, Zheng-Guo
Zhao, Hua
description At present, the Chinese text field is facing challenges from low resource issues such as data scarcity and annotation difficulties. Moreover, in the domain of cigarette tasting, cigarette tasting texts tend to be colloquial, making it difficult to obtain valuable and high-quality tasting texts. Therefore, in this paper, we construct a cigarette tasting dataset (CT2023) and propose a novel Chinese text classification method based on ERNIE and Comparative Learning for Low-Resource scenarios (ECLLR). Firstly, to address the issues of limited vocabulary diversity and sparse features in cigarette tasting text, we utilize Term Frequency-Inverse Document Frequency (TF-IDF) to extract key terms, supplementing the discriminative features of the original text. Secondly, ERNIE is employed to obtain sentence-level vector embedding of the text. Finally, contrastive learning model is used to further refine the text after fusing the keyword features, thereby enhancing the performance of the proposed text classification model. Experiments on the CT2023 dataset demonstrate an accuracy rate of 96.33% for the proposed method, surpassing the baseline model by at least 11 percentage points, and showing good text classification performance. It is thus clear that the proposed approach can effectively provide recommendations and decision support for cigarette production processes in tobacco companies.
doi_str_mv 10.3233/JIFS-237816
format article
fullrecord <record><control><sourceid>crossref</sourceid><recordid>TN_cdi_crossref_primary_10_3233_JIFS_237816</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>10_3233_JIFS_237816</sourcerecordid><originalsourceid>FETCH-crossref_primary_10_3233_JIFS_2378163</originalsourceid><addsrcrecordid>eNqVzs0OATEUBeBGSIyflRfoXko7NT_WE4It-6Zp7lAZU7m3grdHeAGrc05yFh9jEyVnOtV6vtuu9yLVRanyDktUWWSiXOZF991lvhAqXeR9NiA6S6mKLJUJ21T-aBFiBB4tRd8eeXXyLdB7wyNy11giX3tnow8trwPyJtwFAoUbOuDkoLXoA41Yr7YNwfiXQzZdrw7VRjgMRAi1uaK_WHwaJc0Haz5Y88Xq_94vZrVGog</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Cigarette tasting Chinese text classification for low-resource scenarios</title><source>EBSCOhost Business Source Ultimate</source><source>SAGE:Jisc Collections:SAGE Journals Read and Publish 2023-2024:2025 extension (reading list)</source><creator>Diao, Xiu-Li ; Zhang, Hao-Ran ; Zeng, Qing-Tian ; Song, Zheng-Guo ; Zhao, Hua</creator><creatorcontrib>Diao, Xiu-Li ; Zhang, Hao-Ran ; Zeng, Qing-Tian ; Song, Zheng-Guo ; Zhao, Hua</creatorcontrib><description>At present, the Chinese text field is facing challenges from low resource issues such as data scarcity and annotation difficulties. Moreover, in the domain of cigarette tasting, cigarette tasting texts tend to be colloquial, making it difficult to obtain valuable and high-quality tasting texts. Therefore, in this paper, we construct a cigarette tasting dataset (CT2023) and propose a novel Chinese text classification method based on ERNIE and Comparative Learning for Low-Resource scenarios (ECLLR). Firstly, to address the issues of limited vocabulary diversity and sparse features in cigarette tasting text, we utilize Term Frequency-Inverse Document Frequency (TF-IDF) to extract key terms, supplementing the discriminative features of the original text. Secondly, ERNIE is employed to obtain sentence-level vector embedding of the text. Finally, contrastive learning model is used to further refine the text after fusing the keyword features, thereby enhancing the performance of the proposed text classification model. Experiments on the CT2023 dataset demonstrate an accuracy rate of 96.33% for the proposed method, surpassing the baseline model by at least 11 percentage points, and showing good text classification performance. It is thus clear that the proposed approach can effectively provide recommendations and decision support for cigarette production processes in tobacco companies.</description><identifier>ISSN: 1064-1246</identifier><identifier>EISSN: 1875-8967</identifier><identifier>DOI: 10.3233/JIFS-237816</identifier><language>eng</language><ispartof>Journal of intelligent &amp; fuzzy systems, 2024-03, p.1-15</ispartof><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-crossref_primary_10_3233_JIFS_2378163</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,27901,27902</link.rule.ids></links><search><creatorcontrib>Diao, Xiu-Li</creatorcontrib><creatorcontrib>Zhang, Hao-Ran</creatorcontrib><creatorcontrib>Zeng, Qing-Tian</creatorcontrib><creatorcontrib>Song, Zheng-Guo</creatorcontrib><creatorcontrib>Zhao, Hua</creatorcontrib><title>Cigarette tasting Chinese text classification for low-resource scenarios</title><title>Journal of intelligent &amp; fuzzy systems</title><description>At present, the Chinese text field is facing challenges from low resource issues such as data scarcity and annotation difficulties. Moreover, in the domain of cigarette tasting, cigarette tasting texts tend to be colloquial, making it difficult to obtain valuable and high-quality tasting texts. Therefore, in this paper, we construct a cigarette tasting dataset (CT2023) and propose a novel Chinese text classification method based on ERNIE and Comparative Learning for Low-Resource scenarios (ECLLR). Firstly, to address the issues of limited vocabulary diversity and sparse features in cigarette tasting text, we utilize Term Frequency-Inverse Document Frequency (TF-IDF) to extract key terms, supplementing the discriminative features of the original text. Secondly, ERNIE is employed to obtain sentence-level vector embedding of the text. Finally, contrastive learning model is used to further refine the text after fusing the keyword features, thereby enhancing the performance of the proposed text classification model. Experiments on the CT2023 dataset demonstrate an accuracy rate of 96.33% for the proposed method, surpassing the baseline model by at least 11 percentage points, and showing good text classification performance. It is thus clear that the proposed approach can effectively provide recommendations and decision support for cigarette production processes in tobacco companies.</description><issn>1064-1246</issn><issn>1875-8967</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNqVzs0OATEUBeBGSIyflRfoXko7NT_WE4It-6Zp7lAZU7m3grdHeAGrc05yFh9jEyVnOtV6vtuu9yLVRanyDktUWWSiXOZF991lvhAqXeR9NiA6S6mKLJUJ21T-aBFiBB4tRd8eeXXyLdB7wyNy11giX3tnow8trwPyJtwFAoUbOuDkoLXoA41Yr7YNwfiXQzZdrw7VRjgMRAi1uaK_WHwaJc0Haz5Y88Xq_94vZrVGog</recordid><startdate>20240328</startdate><enddate>20240328</enddate><creator>Diao, Xiu-Li</creator><creator>Zhang, Hao-Ran</creator><creator>Zeng, Qing-Tian</creator><creator>Song, Zheng-Guo</creator><creator>Zhao, Hua</creator><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>20240328</creationdate><title>Cigarette tasting Chinese text classification for low-resource scenarios</title><author>Diao, Xiu-Li ; Zhang, Hao-Ran ; Zeng, Qing-Tian ; Song, Zheng-Guo ; Zhao, Hua</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-crossref_primary_10_3233_JIFS_2378163</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Diao, Xiu-Li</creatorcontrib><creatorcontrib>Zhang, Hao-Ran</creatorcontrib><creatorcontrib>Zeng, Qing-Tian</creatorcontrib><creatorcontrib>Song, Zheng-Guo</creatorcontrib><creatorcontrib>Zhao, Hua</creatorcontrib><collection>CrossRef</collection><jtitle>Journal of intelligent &amp; fuzzy systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Diao, Xiu-Li</au><au>Zhang, Hao-Ran</au><au>Zeng, Qing-Tian</au><au>Song, Zheng-Guo</au><au>Zhao, Hua</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Cigarette tasting Chinese text classification for low-resource scenarios</atitle><jtitle>Journal of intelligent &amp; fuzzy systems</jtitle><date>2024-03-28</date><risdate>2024</risdate><spage>1</spage><epage>15</epage><pages>1-15</pages><issn>1064-1246</issn><eissn>1875-8967</eissn><abstract>At present, the Chinese text field is facing challenges from low resource issues such as data scarcity and annotation difficulties. Moreover, in the domain of cigarette tasting, cigarette tasting texts tend to be colloquial, making it difficult to obtain valuable and high-quality tasting texts. Therefore, in this paper, we construct a cigarette tasting dataset (CT2023) and propose a novel Chinese text classification method based on ERNIE and Comparative Learning for Low-Resource scenarios (ECLLR). Firstly, to address the issues of limited vocabulary diversity and sparse features in cigarette tasting text, we utilize Term Frequency-Inverse Document Frequency (TF-IDF) to extract key terms, supplementing the discriminative features of the original text. Secondly, ERNIE is employed to obtain sentence-level vector embedding of the text. Finally, contrastive learning model is used to further refine the text after fusing the keyword features, thereby enhancing the performance of the proposed text classification model. Experiments on the CT2023 dataset demonstrate an accuracy rate of 96.33% for the proposed method, surpassing the baseline model by at least 11 percentage points, and showing good text classification performance. It is thus clear that the proposed approach can effectively provide recommendations and decision support for cigarette production processes in tobacco companies.</abstract><doi>10.3233/JIFS-237816</doi></addata></record>
fulltext fulltext
identifier ISSN: 1064-1246
ispartof Journal of intelligent & fuzzy systems, 2024-03, p.1-15
issn 1064-1246
1875-8967
language eng
recordid cdi_crossref_primary_10_3233_JIFS_237816
source EBSCOhost Business Source Ultimate; SAGE:Jisc Collections:SAGE Journals Read and Publish 2023-2024:2025 extension (reading list)
title Cigarette tasting Chinese text classification for low-resource scenarios
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-03T11%3A47%3A51IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Cigarette%20tasting%20Chinese%20text%20classification%20for%20low-resource%20scenarios&rft.jtitle=Journal%20of%20intelligent%20&%20fuzzy%20systems&rft.au=Diao,%20Xiu-Li&rft.date=2024-03-28&rft.spage=1&rft.epage=15&rft.pages=1-15&rft.issn=1064-1246&rft.eissn=1875-8967&rft_id=info:doi/10.3233/JIFS-237816&rft_dat=%3Ccrossref%3E10_3233_JIFS_237816%3C/crossref%3E%3Cgrp_id%3Ecdi_FETCH-crossref_primary_10_3233_JIFS_2378163%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true