Hybrid approach for text categorization: A case study with Bangla news article

The incredible expansion of online texts due to the Internet has intensified and revived the interest of sorting, managing and categorising the documents into their respective domains. This shows the pressing need for automatic text categorization system to assign a document into its appropriate dom...

Full description

Saved in:
Bibliographic Details
Published in:Journal of information science 2023-06, Vol.49 (3), p.762-777
Main Authors: Dhar, Ankita, Mukherjee, Himadri, Roy, Kaushik, Santosh, KC, Dash, Niladri Sekhar
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c312t-480a675fcc5ee172dbc59d20460de2833edc7e13d2ffb313dd92a7c20d3c41b13
cites cdi_FETCH-LOGICAL-c312t-480a675fcc5ee172dbc59d20460de2833edc7e13d2ffb313dd92a7c20d3c41b13
container_end_page 777
container_issue 3
container_start_page 762
container_title Journal of information science
container_volume 49
creator Dhar, Ankita
Mukherjee, Himadri
Roy, Kaushik
Santosh, KC
Dash, Niladri Sekhar
description The incredible expansion of online texts due to the Internet has intensified and revived the interest of sorting, managing and categorising the documents into their respective domains. This shows the pressing need for automatic text categorization system to assign a document into its appropriate domain. In this article, the focus is on showcasing the effectiveness of a hybrid approach that works elegantly by combining text-based and graph-based features. The hybrid approach was applied on 14,373 Bangla articles with 57,22,569 tokens collected from various online news corpora covering nine categories. This article also presents the individual application of both the features to explicate how they generally work. For classification purposes, the feature sets were passed through the Bayesian classification methods which yield satisfactory results with 98.73% accuracy for Naïve Bayes Multinomial (NBM). Also, to test the robustness and language independency of the system, the experiments were performed on two popular English datasets as well.
doi_str_mv 10.1177/01655515211027770
format article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2823914709</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sage_id>10.1177_01655515211027770</sage_id><sourcerecordid>2823914709</sourcerecordid><originalsourceid>FETCH-LOGICAL-c312t-480a675fcc5ee172dbc59d20460de2833edc7e13d2ffb313dd92a7c20d3c41b13</originalsourceid><addsrcrecordid>eNp1kE9Lw0AQxRdRsFY_gLcFz6k7-yebeKvFWqHoRc9hsztJU2pSd7fU-ulNqOBBPD2Y-b03wyPkGtgEQOtbBqlSChQHYFxrzU7ICLSEJJWZOiWjYZ8MwDm5CGHNGFO5kCPyvDiUvnHUbLe-M3ZFq87TiJ-RWhOx7nzzZWLTtXd02k8C0hB37kD3TVzRe9PWG0Nb3AdqfGzsBi_JWWU2Aa9-dEze5g-vs0WyfHl8mk2XiRXAYyIzZlKtKmsVImjuSqtyx5lMmUOeCYHOagTheFWVoleXc6MtZ05YCSWIMbk55vZff-wwxGLd7Xzbnyx4xkUOUrO8p-BIWd-F4LEqtr55N_5QACuG2oo_tfWeydETTI2_qf8bvgGW02xQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2823914709</pqid></control><display><type>article</type><title>Hybrid approach for text categorization: A case study with Bangla news article</title><source>Library &amp; Information Science Abstracts (LISA)</source><source>SAGE:Jisc Collections:SAGE Journals Read and Publish 2023-2024:2025 extension (reading list)</source><creator>Dhar, Ankita ; Mukherjee, Himadri ; Roy, Kaushik ; Santosh, KC ; Dash, Niladri Sekhar</creator><creatorcontrib>Dhar, Ankita ; Mukherjee, Himadri ; Roy, Kaushik ; Santosh, KC ; Dash, Niladri Sekhar</creatorcontrib><description>The incredible expansion of online texts due to the Internet has intensified and revived the interest of sorting, managing and categorising the documents into their respective domains. This shows the pressing need for automatic text categorization system to assign a document into its appropriate domain. In this article, the focus is on showcasing the effectiveness of a hybrid approach that works elegantly by combining text-based and graph-based features. The hybrid approach was applied on 14,373 Bangla articles with 57,22,569 tokens collected from various online news corpora covering nine categories. This article also presents the individual application of both the features to explicate how they generally work. For classification purposes, the feature sets were passed through the Bayesian classification methods which yield satisfactory results with 98.73% accuracy for Naïve Bayes Multinomial (NBM). Also, to test the robustness and language independency of the system, the experiments were performed on two popular English datasets as well.</description><identifier>ISSN: 0165-5515</identifier><identifier>EISSN: 1741-6485</identifier><identifier>DOI: 10.1177/01655515211027770</identifier><language>eng</language><publisher>London, England: SAGE Publications</publisher><subject>Classification ; Documents ; Domains ; News ; Text categorization</subject><ispartof>Journal of information science, 2023-06, Vol.49 (3), p.762-777</ispartof><rights>The Author(s) 2021</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c312t-480a675fcc5ee172dbc59d20460de2833edc7e13d2ffb313dd92a7c20d3c41b13</citedby><cites>FETCH-LOGICAL-c312t-480a675fcc5ee172dbc59d20460de2833edc7e13d2ffb313dd92a7c20d3c41b13</cites><orcidid>0000-0002-3360-7576 ; 0000-0001-6465-671X</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,27900,27901,34111</link.rule.ids></links><search><creatorcontrib>Dhar, Ankita</creatorcontrib><creatorcontrib>Mukherjee, Himadri</creatorcontrib><creatorcontrib>Roy, Kaushik</creatorcontrib><creatorcontrib>Santosh, KC</creatorcontrib><creatorcontrib>Dash, Niladri Sekhar</creatorcontrib><title>Hybrid approach for text categorization: A case study with Bangla news article</title><title>Journal of information science</title><description>The incredible expansion of online texts due to the Internet has intensified and revived the interest of sorting, managing and categorising the documents into their respective domains. This shows the pressing need for automatic text categorization system to assign a document into its appropriate domain. In this article, the focus is on showcasing the effectiveness of a hybrid approach that works elegantly by combining text-based and graph-based features. The hybrid approach was applied on 14,373 Bangla articles with 57,22,569 tokens collected from various online news corpora covering nine categories. This article also presents the individual application of both the features to explicate how they generally work. For classification purposes, the feature sets were passed through the Bayesian classification methods which yield satisfactory results with 98.73% accuracy for Naïve Bayes Multinomial (NBM). Also, to test the robustness and language independency of the system, the experiments were performed on two popular English datasets as well.</description><subject>Classification</subject><subject>Documents</subject><subject>Domains</subject><subject>News</subject><subject>Text categorization</subject><issn>0165-5515</issn><issn>1741-6485</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>F2A</sourceid><recordid>eNp1kE9Lw0AQxRdRsFY_gLcFz6k7-yebeKvFWqHoRc9hsztJU2pSd7fU-ulNqOBBPD2Y-b03wyPkGtgEQOtbBqlSChQHYFxrzU7ICLSEJJWZOiWjYZ8MwDm5CGHNGFO5kCPyvDiUvnHUbLe-M3ZFq87TiJ-RWhOx7nzzZWLTtXd02k8C0hB37kD3TVzRe9PWG0Nb3AdqfGzsBi_JWWU2Aa9-dEze5g-vs0WyfHl8mk2XiRXAYyIzZlKtKmsVImjuSqtyx5lMmUOeCYHOagTheFWVoleXc6MtZ05YCSWIMbk55vZff-wwxGLd7Xzbnyx4xkUOUrO8p-BIWd-F4LEqtr55N_5QACuG2oo_tfWeydETTI2_qf8bvgGW02xQ</recordid><startdate>202306</startdate><enddate>202306</enddate><creator>Dhar, Ankita</creator><creator>Mukherjee, Himadri</creator><creator>Roy, Kaushik</creator><creator>Santosh, KC</creator><creator>Dash, Niladri Sekhar</creator><general>SAGE Publications</general><general>Bowker-Saur Ltd</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>E3H</scope><scope>F2A</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0002-3360-7576</orcidid><orcidid>https://orcid.org/0000-0001-6465-671X</orcidid></search><sort><creationdate>202306</creationdate><title>Hybrid approach for text categorization: A case study with Bangla news article</title><author>Dhar, Ankita ; Mukherjee, Himadri ; Roy, Kaushik ; Santosh, KC ; Dash, Niladri Sekhar</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c312t-480a675fcc5ee172dbc59d20460de2833edc7e13d2ffb313dd92a7c20d3c41b13</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Classification</topic><topic>Documents</topic><topic>Domains</topic><topic>News</topic><topic>Text categorization</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Dhar, Ankita</creatorcontrib><creatorcontrib>Mukherjee, Himadri</creatorcontrib><creatorcontrib>Roy, Kaushik</creatorcontrib><creatorcontrib>Santosh, KC</creatorcontrib><creatorcontrib>Dash, Niladri Sekhar</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>Library &amp; Information Sciences Abstracts (LISA)</collection><collection>Library &amp; Information Science Abstracts (LISA)</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Journal of information science</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Dhar, Ankita</au><au>Mukherjee, Himadri</au><au>Roy, Kaushik</au><au>Santosh, KC</au><au>Dash, Niladri Sekhar</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Hybrid approach for text categorization: A case study with Bangla news article</atitle><jtitle>Journal of information science</jtitle><date>2023-06</date><risdate>2023</risdate><volume>49</volume><issue>3</issue><spage>762</spage><epage>777</epage><pages>762-777</pages><issn>0165-5515</issn><eissn>1741-6485</eissn><abstract>The incredible expansion of online texts due to the Internet has intensified and revived the interest of sorting, managing and categorising the documents into their respective domains. This shows the pressing need for automatic text categorization system to assign a document into its appropriate domain. In this article, the focus is on showcasing the effectiveness of a hybrid approach that works elegantly by combining text-based and graph-based features. The hybrid approach was applied on 14,373 Bangla articles with 57,22,569 tokens collected from various online news corpora covering nine categories. This article also presents the individual application of both the features to explicate how they generally work. For classification purposes, the feature sets were passed through the Bayesian classification methods which yield satisfactory results with 98.73% accuracy for Naïve Bayes Multinomial (NBM). Also, to test the robustness and language independency of the system, the experiments were performed on two popular English datasets as well.</abstract><cop>London, England</cop><pub>SAGE Publications</pub><doi>10.1177/01655515211027770</doi><tpages>16</tpages><orcidid>https://orcid.org/0000-0002-3360-7576</orcidid><orcidid>https://orcid.org/0000-0001-6465-671X</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 0165-5515
ispartof Journal of information science, 2023-06, Vol.49 (3), p.762-777
issn 0165-5515
1741-6485
language eng
recordid cdi_proquest_journals_2823914709
source Library & Information Science Abstracts (LISA); SAGE:Jisc Collections:SAGE Journals Read and Publish 2023-2024:2025 extension (reading list)
subjects Classification
Documents
Domains
News
Text categorization
title Hybrid approach for text categorization: A case study with Bangla news article
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-24T12%3A26%3A52IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Hybrid%20approach%20for%20text%20categorization:%20A%20case%20study%20with%20Bangla%20news%20article&rft.jtitle=Journal%20of%20information%20science&rft.au=Dhar,%20Ankita&rft.date=2023-06&rft.volume=49&rft.issue=3&rft.spage=762&rft.epage=777&rft.pages=762-777&rft.issn=0165-5515&rft.eissn=1741-6485&rft_id=info:doi/10.1177/01655515211027770&rft_dat=%3Cproquest_cross%3E2823914709%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c312t-480a675fcc5ee172dbc59d20460de2833edc7e13d2ffb313dd92a7c20d3c41b13%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2823914709&rft_id=info:pmid/&rft_sage_id=10.1177_01655515211027770&rfr_iscdi=true