Loading…

Tackling Concept Shift in Text Classification using Entailment-style Modeling

Pre-trained language models (PLMs) have seen tremendous success in text classification (TC) problems in the context of Natural Language Processing (NLP). In many real-world text classification tasks, the class definitions being learned do not remain constant but rather change with time - this is kno...

Full description

Saved in:

Bibliographic Details
Published in:	arXiv.org 2023-11
Main Authors:	Roychowdhury, Sumegh, Gupta, Karan, Siva Rajesh Kasa, Murthy, Prasanna Srinivasa, Chandra, Alok
Format:	Article
Language:	English
Subjects:	Classification Classifiers Labeling Natural language processing Synthetic data Text categorization
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by
cites
container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Roychowdhury, Sumegh Gupta, Karan Siva Rajesh Kasa Murthy, Prasanna Srinivasa Chandra, Alok
description	Pre-trained language models (PLMs) have seen tremendous success in text classification (TC) problems in the context of Natural Language Processing (NLP). In many real-world text classification tasks, the class definitions being learned do not remain constant but rather change with time - this is known as Concept Shift. Most techniques for handling concept shift rely on retraining the old classifiers with the newly labelled data. However, given the amount of training data required to fine-tune large DL models for the new concepts, the associated labelling costs can be prohibitively expensive and time consuming. In this work, we propose a reformulation, converting vanilla classification into an entailment-style problem that requires significantly less data to re-train the text classifier to adapt to new concepts. We demonstrate the effectiveness of our proposed method on both real world & synthetic datasets achieving absolute F1 gains upto 7% and 40% respectively in few-shot settings. Further, upon deployment, our solution also helped save 75% of labeling costs overall.
format	article
fullrecord	<record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2886747089</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2886747089</sourcerecordid><originalsourceid>FETCH-proquest_journals_28867470893</originalsourceid><addsrcrecordid>eNqNjEsKwjAUAIMgWLR3CLguxKSfuC4VN13ZvYSa6qsxqX2voLfXggdwNYsZZsEiqdQu0amUKxYj9kIImRcyy1TE6sa0dwf-ysvgWzsQP92gIw6eN_ZFvHQGETpoDUHwfMI5rTwZcA_rKUF6O8vrcLHzZMOWnXFo4x_XbHuomvKYDGN4Thbp3Idp9F91llrnRVoIvVf_VR-fqD4r</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2886747089</pqid></control><display><type>article</type><title>Tackling Concept Shift in Text Classification using Entailment-style Modeling</title><source>Publicly Available Content Database</source><creator>Roychowdhury, Sumegh ; Gupta, Karan ; Siva Rajesh Kasa ; Murthy, Prasanna Srinivasa ; Chandra, Alok</creator><creatorcontrib>Roychowdhury, Sumegh ; Gupta, Karan ; Siva Rajesh Kasa ; Murthy, Prasanna Srinivasa ; Chandra, Alok</creatorcontrib><description>Pre-trained language models (PLMs) have seen tremendous success in text classification (TC) problems in the context of Natural Language Processing (NLP). In many real-world text classification tasks, the class definitions being learned do not remain constant but rather change with time - this is known as Concept Shift. Most techniques for handling concept shift rely on retraining the old classifiers with the newly labelled data. However, given the amount of training data required to fine-tune large DL models for the new concepts, the associated labelling costs can be prohibitively expensive and time consuming. In this work, we propose a reformulation, converting vanilla classification into an entailment-style problem that requires significantly less data to re-train the text classifier to adapt to new concepts. We demonstrate the effectiveness of our proposed method on both real world & synthetic datasets achieving absolute F1 gains upto 7% and 40% respectively in few-shot settings. Further, upon deployment, our solution also helped save 75% of labeling costs overall.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Classification ; Classifiers ; Labeling ; Natural language processing ; Synthetic data ; Text categorization</subject><ispartof>arXiv.org, 2023-11</ispartof><rights>2023. This work is published under http://creativecommons.org/licenses/by-nc-sa/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/2886747089?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>780,784,25753,37012,44590</link.rule.ids></links><search><creatorcontrib>Roychowdhury, Sumegh</creatorcontrib><creatorcontrib>Gupta, Karan</creatorcontrib><creatorcontrib>Siva Rajesh Kasa</creatorcontrib><creatorcontrib>Murthy, Prasanna Srinivasa</creatorcontrib><creatorcontrib>Chandra, Alok</creatorcontrib><title>Tackling Concept Shift in Text Classification using Entailment-style Modeling</title><title>arXiv.org</title><description>Pre-trained language models (PLMs) have seen tremendous success in text classification (TC) problems in the context of Natural Language Processing (NLP). In many real-world text classification tasks, the class definitions being learned do not remain constant but rather change with time - this is known as Concept Shift. Most techniques for handling concept shift rely on retraining the old classifiers with the newly labelled data. However, given the amount of training data required to fine-tune large DL models for the new concepts, the associated labelling costs can be prohibitively expensive and time consuming. In this work, we propose a reformulation, converting vanilla classification into an entailment-style problem that requires significantly less data to re-train the text classifier to adapt to new concepts. We demonstrate the effectiveness of our proposed method on both real world & synthetic datasets achieving absolute F1 gains upto 7% and 40% respectively in few-shot settings. Further, upon deployment, our solution also helped save 75% of labeling costs overall.</description><subject>Classification</subject><subject>Classifiers</subject><subject>Labeling</subject><subject>Natural language processing</subject><subject>Synthetic data</subject><subject>Text categorization</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><recordid>eNqNjEsKwjAUAIMgWLR3CLguxKSfuC4VN13ZvYSa6qsxqX2voLfXggdwNYsZZsEiqdQu0amUKxYj9kIImRcyy1TE6sa0dwf-ysvgWzsQP92gIw6eN_ZFvHQGETpoDUHwfMI5rTwZcA_rKUF6O8vrcLHzZMOWnXFo4x_XbHuomvKYDGN4Thbp3Idp9F91llrnRVoIvVf_VR-fqD4r</recordid><startdate>20231106</startdate><enddate>20231106</enddate><creator>Roychowdhury, Sumegh</creator><creator>Gupta, Karan</creator><creator>Siva Rajesh Kasa</creator><creator>Murthy, Prasanna Srinivasa</creator><creator>Chandra, Alok</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20231106</creationdate><title>Tackling Concept Shift in Text Classification using Entailment-style Modeling</title><author>Roychowdhury, Sumegh ; Gupta, Karan ; Siva Rajesh Kasa ; Murthy, Prasanna Srinivasa ; Chandra, Alok</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_28867470893</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Classification</topic><topic>Classifiers</topic><topic>Labeling</topic><topic>Natural language processing</topic><topic>Synthetic data</topic><topic>Text categorization</topic><toplevel>online_resources</toplevel><creatorcontrib>Roychowdhury, Sumegh</creatorcontrib><creatorcontrib>Gupta, Karan</creatorcontrib><creatorcontrib>Siva Rajesh Kasa</creatorcontrib><creatorcontrib>Murthy, Prasanna Srinivasa</creatorcontrib><creatorcontrib>Chandra, Alok</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>AUTh Library subscriptions: ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Roychowdhury, Sumegh</au><au>Gupta, Karan</au><au>Siva Rajesh Kasa</au><au>Murthy, Prasanna Srinivasa</au><au>Chandra, Alok</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Tackling Concept Shift in Text Classification using Entailment-style Modeling</atitle><jtitle>arXiv.org</jtitle><date>2023-11-06</date><risdate>2023</risdate><eissn>2331-8422</eissn><abstract>Pre-trained language models (PLMs) have seen tremendous success in text classification (TC) problems in the context of Natural Language Processing (NLP). In many real-world text classification tasks, the class definitions being learned do not remain constant but rather change with time - this is known as Concept Shift. Most techniques for handling concept shift rely on retraining the old classifiers with the newly labelled data. However, given the amount of training data required to fine-tune large DL models for the new concepts, the associated labelling costs can be prohibitively expensive and time consuming. In this work, we propose a reformulation, converting vanilla classification into an entailment-style problem that requires significantly less data to re-train the text classifier to adapt to new concepts. We demonstrate the effectiveness of our proposed method on both real world & synthetic datasets achieving absolute F1 gains upto 7% and 40% respectively in few-shot settings. Further, upon deployment, our solution also helped save 75% of labeling costs overall.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2023-11
issn	2331-8422
language	eng
recordid	cdi_proquest_journals_2886747089
source	Publicly Available Content Database
subjects	Classification Classifiers Labeling Natural language processing Synthetic data Text categorization
title	Tackling Concept Shift in Text Classification using Entailment-style Modeling
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-27T15%3A51%3A27IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Tackling%20Concept%20Shift%20in%20Text%20Classification%20using%20Entailment-style%20Modeling&rft.jtitle=arXiv.org&rft.au=Roychowdhury,%20Sumegh&rft.date=2023-11-06&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2886747089%3C/proquest%3E%3Cgrp_id%3Ecdi_FETCH-proquest_journals_28867470893%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2886747089&rft_id=info:pmid/&rfr_iscdi=true