Loading…

Visualization and performance measure to determine number of topics in twitter data clustering using hybrid topic modeling

Topic models are widely used in building clusters of documents for more than a decade, yet problems occurring in choosing the optimal number of topics. The main problem is the lack of a stable metric of the quality of topics obtained during the construction of topic models. The authors analyzed from...

Full description

Saved in:
Bibliographic Details
Published in:Journal of intelligent & fuzzy systems 2021-01, Vol.41 (1), p.803-817
Main Authors: Noorullah, R.M., Mohammed, Moulana
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c261t-a5d1a703213f16821c9440e5ce040d8921e5d1f0d658d76a7551fbc9d5365e8c3
cites cdi_FETCH-LOGICAL-c261t-a5d1a703213f16821c9440e5ce040d8921e5d1f0d658d76a7551fbc9d5365e8c3
container_end_page 817
container_issue 1
container_start_page 803
container_title Journal of intelligent & fuzzy systems
container_volume 41
creator Noorullah, R.M.
Mohammed, Moulana
description Topic models are widely used in building clusters of documents for more than a decade, yet problems occurring in choosing the optimal number of topics. The main problem is the lack of a stable metric of the quality of topics obtained during the construction of topic models. The authors analyzed from previous works, most of the models used in determining the number of topics are non-parametric and the quality of topics determined by using perplexity and coherence measures and concluded that they are not applicable in solving this problem. In this paper, we used the parametric method, which is an extension of the traditional topic model with visual access tendency for visualization of the number of topics (clusters) to complement clustering and to choose the optimal number of topics based on results of cluster validity indices. Developed hybrid topic models are demonstrated with different Twitter datasets on various topics in obtaining the optimal number of topics and in measuring the quality of clusters. The experimental results showed that the Visual Non-negative Matrix Factorization (VNMF) topic model performs well in determining the optimal number of topics with interactive visualization and in performance measure of the quality of clusters with validity indices.
doi_str_mv 10.3233/JIFS-202707
format article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2560910634</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2560910634</sourcerecordid><originalsourceid>FETCH-LOGICAL-c261t-a5d1a703213f16821c9440e5ce040d8921e5d1f0d658d76a7551fbc9d5365e8c3</originalsourceid><addsrcrecordid>eNotkFtLwzAUgIMoOKdP_oGAj1LNpUnaRxlOJwMfvLyWLDnVjDatSYpsv96M-XKuH-fAh9A1JXeccX7_slq-FYwwRdQJmtFKiaKqpTrNNZFlQVkpz9FFjFtCqBKMzND-08VJd26vkxs81t7iEUI7hF57A7gHHacAOA3YQoLQOw_YT_0GAh7aPB6didh5nH5dyntsddLYdFPMjfNfeIqH-L3bBGePOO4HC12eXqKzVncRrv7zHH0sH98Xz8X69Wm1eFgXhkmaCi0s1YpwRnlLZcWoqcuSgDBASmKrmlHIREusFJVVUishaLsxtRVcCqgMn6Ob490xDD8TxNRshyn4_LJhQpI6q-Flpm6PlAlDjAHaZgyu12HXUNIc5DYHuc1RLv8D6SJuUA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2560910634</pqid></control><display><type>article</type><title>Visualization and performance measure to determine number of topics in twitter data clustering using hybrid topic modeling</title><source>BSC - Ebsco (Business Source Ultimate)</source><creator>Noorullah, R.M. ; Mohammed, Moulana</creator><creatorcontrib>Noorullah, R.M. ; Mohammed, Moulana</creatorcontrib><description>Topic models are widely used in building clusters of documents for more than a decade, yet problems occurring in choosing the optimal number of topics. The main problem is the lack of a stable metric of the quality of topics obtained during the construction of topic models. The authors analyzed from previous works, most of the models used in determining the number of topics are non-parametric and the quality of topics determined by using perplexity and coherence measures and concluded that they are not applicable in solving this problem. In this paper, we used the parametric method, which is an extension of the traditional topic model with visual access tendency for visualization of the number of topics (clusters) to complement clustering and to choose the optimal number of topics based on results of cluster validity indices. Developed hybrid topic models are demonstrated with different Twitter datasets on various topics in obtaining the optimal number of topics and in measuring the quality of clusters. The experimental results showed that the Visual Non-negative Matrix Factorization (VNMF) topic model performs well in determining the optimal number of topics with interactive visualization and in performance measure of the quality of clusters with validity indices.</description><identifier>ISSN: 1064-1246</identifier><identifier>EISSN: 1875-8967</identifier><identifier>DOI: 10.3233/JIFS-202707</identifier><language>eng</language><publisher>Amsterdam: IOS Press BV</publisher><subject>Clustering ; Visualization</subject><ispartof>Journal of intelligent &amp; fuzzy systems, 2021-01, Vol.41 (1), p.803-817</ispartof><rights>Copyright IOS Press BV 2021</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c261t-a5d1a703213f16821c9440e5ce040d8921e5d1f0d658d76a7551fbc9d5365e8c3</citedby><cites>FETCH-LOGICAL-c261t-a5d1a703213f16821c9440e5ce040d8921e5d1f0d658d76a7551fbc9d5365e8c3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,27903,27904</link.rule.ids></links><search><creatorcontrib>Noorullah, R.M.</creatorcontrib><creatorcontrib>Mohammed, Moulana</creatorcontrib><title>Visualization and performance measure to determine number of topics in twitter data clustering using hybrid topic modeling</title><title>Journal of intelligent &amp; fuzzy systems</title><description>Topic models are widely used in building clusters of documents for more than a decade, yet problems occurring in choosing the optimal number of topics. The main problem is the lack of a stable metric of the quality of topics obtained during the construction of topic models. The authors analyzed from previous works, most of the models used in determining the number of topics are non-parametric and the quality of topics determined by using perplexity and coherence measures and concluded that they are not applicable in solving this problem. In this paper, we used the parametric method, which is an extension of the traditional topic model with visual access tendency for visualization of the number of topics (clusters) to complement clustering and to choose the optimal number of topics based on results of cluster validity indices. Developed hybrid topic models are demonstrated with different Twitter datasets on various topics in obtaining the optimal number of topics and in measuring the quality of clusters. The experimental results showed that the Visual Non-negative Matrix Factorization (VNMF) topic model performs well in determining the optimal number of topics with interactive visualization and in performance measure of the quality of clusters with validity indices.</description><subject>Clustering</subject><subject>Visualization</subject><issn>1064-1246</issn><issn>1875-8967</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><recordid>eNotkFtLwzAUgIMoOKdP_oGAj1LNpUnaRxlOJwMfvLyWLDnVjDatSYpsv96M-XKuH-fAh9A1JXeccX7_slq-FYwwRdQJmtFKiaKqpTrNNZFlQVkpz9FFjFtCqBKMzND-08VJd26vkxs81t7iEUI7hF57A7gHHacAOA3YQoLQOw_YT_0GAh7aPB6didh5nH5dyntsddLYdFPMjfNfeIqH-L3bBGePOO4HC12eXqKzVncRrv7zHH0sH98Xz8X69Wm1eFgXhkmaCi0s1YpwRnlLZcWoqcuSgDBASmKrmlHIREusFJVVUishaLsxtRVcCqgMn6Ob490xDD8TxNRshyn4_LJhQpI6q-Flpm6PlAlDjAHaZgyu12HXUNIc5DYHuc1RLv8D6SJuUA</recordid><startdate>20210101</startdate><enddate>20210101</enddate><creator>Noorullah, R.M.</creator><creator>Mohammed, Moulana</creator><general>IOS Press BV</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20210101</creationdate><title>Visualization and performance measure to determine number of topics in twitter data clustering using hybrid topic modeling</title><author>Noorullah, R.M. ; Mohammed, Moulana</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c261t-a5d1a703213f16821c9440e5ce040d8921e5d1f0d658d76a7551fbc9d5365e8c3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Clustering</topic><topic>Visualization</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Noorullah, R.M.</creatorcontrib><creatorcontrib>Mohammed, Moulana</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Journal of intelligent &amp; fuzzy systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Noorullah, R.M.</au><au>Mohammed, Moulana</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Visualization and performance measure to determine number of topics in twitter data clustering using hybrid topic modeling</atitle><jtitle>Journal of intelligent &amp; fuzzy systems</jtitle><date>2021-01-01</date><risdate>2021</risdate><volume>41</volume><issue>1</issue><spage>803</spage><epage>817</epage><pages>803-817</pages><issn>1064-1246</issn><eissn>1875-8967</eissn><abstract>Topic models are widely used in building clusters of documents for more than a decade, yet problems occurring in choosing the optimal number of topics. The main problem is the lack of a stable metric of the quality of topics obtained during the construction of topic models. The authors analyzed from previous works, most of the models used in determining the number of topics are non-parametric and the quality of topics determined by using perplexity and coherence measures and concluded that they are not applicable in solving this problem. In this paper, we used the parametric method, which is an extension of the traditional topic model with visual access tendency for visualization of the number of topics (clusters) to complement clustering and to choose the optimal number of topics based on results of cluster validity indices. Developed hybrid topic models are demonstrated with different Twitter datasets on various topics in obtaining the optimal number of topics and in measuring the quality of clusters. The experimental results showed that the Visual Non-negative Matrix Factorization (VNMF) topic model performs well in determining the optimal number of topics with interactive visualization and in performance measure of the quality of clusters with validity indices.</abstract><cop>Amsterdam</cop><pub>IOS Press BV</pub><doi>10.3233/JIFS-202707</doi><tpages>15</tpages></addata></record>
fulltext fulltext
identifier ISSN: 1064-1246
ispartof Journal of intelligent & fuzzy systems, 2021-01, Vol.41 (1), p.803-817
issn 1064-1246
1875-8967
language eng
recordid cdi_proquest_journals_2560910634
source BSC - Ebsco (Business Source Ultimate)
subjects Clustering
Visualization
title Visualization and performance measure to determine number of topics in twitter data clustering using hybrid topic modeling
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-22T15%3A40%3A49IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Visualization%20and%20performance%20measure%20to%20determine%20number%20of%20topics%20in%20twitter%20data%20clustering%20using%20hybrid%20topic%20modeling&rft.jtitle=Journal%20of%20intelligent%20&%20fuzzy%20systems&rft.au=Noorullah,%20R.M.&rft.date=2021-01-01&rft.volume=41&rft.issue=1&rft.spage=803&rft.epage=817&rft.pages=803-817&rft.issn=1064-1246&rft.eissn=1875-8967&rft_id=info:doi/10.3233/JIFS-202707&rft_dat=%3Cproquest_cross%3E2560910634%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c261t-a5d1a703213f16821c9440e5ce040d8921e5d1f0d658d76a7551fbc9d5365e8c3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2560910634&rft_id=info:pmid/&rfr_iscdi=true