Loading…

A Multi-Criteria Document Clustering Method Based on Topic Modeling and Pseudoclosure Function

We address in this work the problem of document clustering. Our contribution proposes a novel unsupervised clustering method based on the structural analysis of the latent semantic space. Each document in the space is a vector of probabilities that represents a distribution of topics. The document m...

Full description

Saved in:

Bibliographic Details
Published in:	Informatica (Ljubljana) 2016-06, Vol.40 (2), p.169-169
Main Authors:	Bui, Quang Vu, Sayadi, Karim, Bui, Marc
Format:	Article
Language:	English
Subjects:	Clustering Clusters Computation Criteria Mathematical analysis Semantics Structural analysis Vectors (mathematics)
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by
cites
container_end_page	169
container_issue	2
container_start_page	169
container_title	Informatica (Ljubljana)
container_volume	40
creator	Bui, Quang Vu Sayadi, Karim Bui, Marc
description	We address in this work the problem of document clustering. Our contribution proposes a novel unsupervised clustering method based on the structural analysis of the latent semantic space. Each document in the space is a vector of probabilities that represents a distribution of topics. The document membership to a cluster is computed taking into account two criteria: the major topic in the document (qualitative criterion) and the distance measure between the vectors of probabilities (quantitative criterion). We perform a structural analysis on the latent semantic space using the Pretopology theory that allows us to investigate the role of the number of clusters and the chosen centroids, in the similarity between the computed clusters. We have applied our method to Twitter data and showed the accuracy of our results compared to a random choice number of clusters.
format	article
fullrecord	<record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_miscellaneous_1835655020</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>4134683571</sourcerecordid><originalsourceid>FETCH-LOGICAL-p216t-234db69c5c13ce8d37cefb834ca9392204165a6b0101bf3f1be931346e5bf5563</originalsourceid><addsrcrecordid>eNpdjsFKAzEURYMoWKv_EHDjZiDJm6SZZR2tCh10UbeWTPJGp6RJnST_b4uuXF0493C5Z2TGtawr0At-TmYMJKukbNQluUppx1gNXIsZ-VjSrvg8Vu00ZpxGQx-iLXsMmba-pBMKn7TD_BUdvTcJHY2BbuJhtLSLDv2pNsHRt4TFRetjKhPSVQk2jzFck4vB-IQ3fzkn76vHTftcrV-fXtrlujoIrnIloHa9aqy0HCxqBwuLQ6-htqaBRghWcyWN6hlnvB9g4D02wKFWKPtBSgVzcve7e5jid8GUt_sxWfTeBIwlbbkGqaRkgh3V23_qLpYpHN8dLaYZcCEk_ADCrV5W</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1808031225</pqid></control><display><type>article</type><title>A Multi-Criteria Document Clustering Method Based on Topic Modeling and Pseudoclosure Function</title><source>Publicly Available Content Database (Proquest) (PQ_SDU_P3)</source><creator>Bui, Quang Vu ; Sayadi, Karim ; Bui, Marc</creator><creatorcontrib>Bui, Quang Vu ; Sayadi, Karim ; Bui, Marc</creatorcontrib><description>We address in this work the problem of document clustering. Our contribution proposes a novel unsupervised clustering method based on the structural analysis of the latent semantic space. Each document in the space is a vector of probabilities that represents a distribution of topics. The document membership to a cluster is computed taking into account two criteria: the major topic in the document (qualitative criterion) and the distance measure between the vectors of probabilities (quantitative criterion). We perform a structural analysis on the latent semantic space using the Pretopology theory that allows us to investigate the role of the number of clusters and the chosen centroids, in the similarity between the computed clusters. We have applied our method to Twitter data and showed the accuracy of our results compared to a random choice number of clusters.</description><identifier>ISSN: 0350-5596</identifier><identifier>EISSN: 1854-3871</identifier><language>eng</language><publisher>Ljubljana: Slovenian Society Informatika / Slovensko drustvo Informatika</publisher><subject>Clustering ; Clusters ; Computation ; Criteria ; Mathematical analysis ; Semantics ; Structural analysis ; Vectors (mathematics)</subject><ispartof>Informatica (Ljubljana), 2016-06, Vol.40 (2), p.169-169</ispartof><rights>Copyright Slovenian Society Informatika / Slovensko drustvo Informatika Jun 2016</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.proquest.com/docview/1808031225/fulltextPDF?pq-origsite=primo$$EPDF$$P50$$Gproquest$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.proquest.com/docview/1808031225?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>314,776,780,25732,36991,36992,44569,74872</link.rule.ids></links><search><creatorcontrib>Bui, Quang Vu</creatorcontrib><creatorcontrib>Sayadi, Karim</creatorcontrib><creatorcontrib>Bui, Marc</creatorcontrib><title>A Multi-Criteria Document Clustering Method Based on Topic Modeling and Pseudoclosure Function</title><title>Informatica (Ljubljana)</title><description>We address in this work the problem of document clustering. Our contribution proposes a novel unsupervised clustering method based on the structural analysis of the latent semantic space. Each document in the space is a vector of probabilities that represents a distribution of topics. The document membership to a cluster is computed taking into account two criteria: the major topic in the document (qualitative criterion) and the distance measure between the vectors of probabilities (quantitative criterion). We perform a structural analysis on the latent semantic space using the Pretopology theory that allows us to investigate the role of the number of clusters and the chosen centroids, in the similarity between the computed clusters. We have applied our method to Twitter data and showed the accuracy of our results compared to a random choice number of clusters.</description><subject>Clustering</subject><subject>Clusters</subject><subject>Computation</subject><subject>Criteria</subject><subject>Mathematical analysis</subject><subject>Semantics</subject><subject>Structural analysis</subject><subject>Vectors (mathematics)</subject><issn>0350-5596</issn><issn>1854-3871</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2016</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><recordid>eNpdjsFKAzEURYMoWKv_EHDjZiDJm6SZZR2tCh10UbeWTPJGp6RJnST_b4uuXF0493C5Z2TGtawr0At-TmYMJKukbNQluUppx1gNXIsZ-VjSrvg8Vu00ZpxGQx-iLXsMmba-pBMKn7TD_BUdvTcJHY2BbuJhtLSLDv2pNsHRt4TFRetjKhPSVQk2jzFck4vB-IQ3fzkn76vHTftcrV-fXtrlujoIrnIloHa9aqy0HCxqBwuLQ6-htqaBRghWcyWN6hlnvB9g4D02wKFWKPtBSgVzcve7e5jid8GUt_sxWfTeBIwlbbkGqaRkgh3V23_qLpYpHN8dLaYZcCEk_ADCrV5W</recordid><startdate>20160601</startdate><enddate>20160601</enddate><creator>Bui, Quang Vu</creator><creator>Sayadi, Karim</creator><creator>Bui, Marc</creator><general>Slovenian Society Informatika / Slovensko drustvo Informatika</general><scope>3V.</scope><scope>7SC</scope><scope>7XB</scope><scope>8AL</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FK</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>BYOGL</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>M0N</scope><scope>P5Z</scope><scope>P62</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>Q9U</scope></search><sort><creationdate>20160601</creationdate><title>A Multi-Criteria Document Clustering Method Based on Topic Modeling and Pseudoclosure Function</title><author>Bui, Quang Vu ; Sayadi, Karim ; Bui, Marc</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-p216t-234db69c5c13ce8d37cefb834ca9392204165a6b0101bf3f1be931346e5bf5563</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2016</creationdate><topic>Clustering</topic><topic>Clusters</topic><topic>Computation</topic><topic>Criteria</topic><topic>Mathematical analysis</topic><topic>Semantics</topic><topic>Structural analysis</topic><topic>Vectors (mathematics)</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Bui, Quang Vu</creatorcontrib><creatorcontrib>Sayadi, Karim</creatorcontrib><creatorcontrib>Bui, Marc</creatorcontrib><collection>ProQuest Central (Corporate)</collection><collection>Computer and Information Systems Abstracts</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Computing Database (Alumni Edition)</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>Advanced Technologies & Aerospace Database‎ (1962 - current)</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>East Europe, Central Europe Database</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Computing Database</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>Publicly Available Content Database (Proquest) (PQ_SDU_P3)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>ProQuest Central Basic</collection><jtitle>Informatica (Ljubljana)</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Bui, Quang Vu</au><au>Sayadi, Karim</au><au>Bui, Marc</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A Multi-Criteria Document Clustering Method Based on Topic Modeling and Pseudoclosure Function</atitle><jtitle>Informatica (Ljubljana)</jtitle><date>2016-06-01</date><risdate>2016</risdate><volume>40</volume><issue>2</issue><spage>169</spage><epage>169</epage><pages>169-169</pages><issn>0350-5596</issn><eissn>1854-3871</eissn><abstract>We address in this work the problem of document clustering. Our contribution proposes a novel unsupervised clustering method based on the structural analysis of the latent semantic space. Each document in the space is a vector of probabilities that represents a distribution of topics. The document membership to a cluster is computed taking into account two criteria: the major topic in the document (qualitative criterion) and the distance measure between the vectors of probabilities (quantitative criterion). We perform a structural analysis on the latent semantic space using the Pretopology theory that allows us to investigate the role of the number of clusters and the chosen centroids, in the similarity between the computed clusters. We have applied our method to Twitter data and showed the accuracy of our results compared to a random choice number of clusters.</abstract><cop>Ljubljana</cop><pub>Slovenian Society Informatika / Slovensko drustvo Informatika</pub><tpages>1</tpages><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 0350-5596
ispartof	Informatica (Ljubljana), 2016-06, Vol.40 (2), p.169-169
issn	0350-5596 1854-3871
language	eng
recordid	cdi_proquest_miscellaneous_1835655020
source	Publicly Available Content Database (Proquest) (PQ_SDU_P3)
subjects	Clustering Clusters Computation Criteria Mathematical analysis Semantics Structural analysis Vectors (mathematics)
title	A Multi-Criteria Document Clustering Method Based on Topic Modeling and Pseudoclosure Function
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-23T11%3A32%3A40IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20Multi-Criteria%20Document%20Clustering%20Method%20Based%20on%20Topic%20Modeling%20and%20Pseudoclosure%20Function&rft.jtitle=Informatica%20(Ljubljana)&rft.au=Bui,%20Quang%20Vu&rft.date=2016-06-01&rft.volume=40&rft.issue=2&rft.spage=169&rft.epage=169&rft.pages=169-169&rft.issn=0350-5596&rft.eissn=1854-3871&rft_id=info:doi/&rft_dat=%3Cproquest%3E4134683571%3C/proquest%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-p216t-234db69c5c13ce8d37cefb834ca9392204165a6b0101bf3f1be931346e5bf5563%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=1808031225&rft_id=info:pmid/&rfr_iscdi=true