Loading…

A Multi-Criteria Document Clustering Method Based on Topic Modeling and Pseudoclosure Function

We address in this work the problem of document clustering. Our contribution proposes a novel unsupervised clustering method based on the structural analysis of the latent semantic space. Each document in the space is a vector of probabilities that represents a distribution of topics. The document m...

Full description

Saved in:
Bibliographic Details
Published in:Informatica (Ljubljana) 2016-06, Vol.40 (2), p.169-169
Main Authors: Bui, Quang Vu, Sayadi, Karim, Bui, Marc
Format: Article
Language:English
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites
container_end_page 169
container_issue 2
container_start_page 169
container_title Informatica (Ljubljana)
container_volume 40
creator Bui, Quang Vu
Sayadi, Karim
Bui, Marc
description We address in this work the problem of document clustering. Our contribution proposes a novel unsupervised clustering method based on the structural analysis of the latent semantic space. Each document in the space is a vector of probabilities that represents a distribution of topics. The document membership to a cluster is computed taking into account two criteria: the major topic in the document (qualitative criterion) and the distance measure between the vectors of probabilities (quantitative criterion). We perform a structural analysis on the latent semantic space using the Pretopology theory that allows us to investigate the role of the number of clusters and the chosen centroids, in the similarity between the computed clusters. We have applied our method to Twitter data and showed the accuracy of our results compared to a random choice number of clusters.
format article
fullrecord <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_miscellaneous_1835655020</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>4134683571</sourcerecordid><originalsourceid>FETCH-LOGICAL-p216t-234db69c5c13ce8d37cefb834ca9392204165a6b0101bf3f1be931346e5bf5563</originalsourceid><addsrcrecordid>eNpdjsFKAzEURYMoWKv_EHDjZiDJm6SZZR2tCh10UbeWTPJGp6RJnST_b4uuXF0493C5Z2TGtawr0At-TmYMJKukbNQluUppx1gNXIsZ-VjSrvg8Vu00ZpxGQx-iLXsMmba-pBMKn7TD_BUdvTcJHY2BbuJhtLSLDv2pNsHRt4TFRetjKhPSVQk2jzFck4vB-IQ3fzkn76vHTftcrV-fXtrlujoIrnIloHa9aqy0HCxqBwuLQ6-htqaBRghWcyWN6hlnvB9g4D02wKFWKPtBSgVzcve7e5jid8GUt_sxWfTeBIwlbbkGqaRkgh3V23_qLpYpHN8dLaYZcCEk_ADCrV5W</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1808031225</pqid></control><display><type>article</type><title>A Multi-Criteria Document Clustering Method Based on Topic Modeling and Pseudoclosure Function</title><source>Publicly Available Content Database (Proquest) (PQ_SDU_P3)</source><creator>Bui, Quang Vu ; Sayadi, Karim ; Bui, Marc</creator><creatorcontrib>Bui, Quang Vu ; Sayadi, Karim ; Bui, Marc</creatorcontrib><description>We address in this work the problem of document clustering. Our contribution proposes a novel unsupervised clustering method based on the structural analysis of the latent semantic space. Each document in the space is a vector of probabilities that represents a distribution of topics. The document membership to a cluster is computed taking into account two criteria: the major topic in the document (qualitative criterion) and the distance measure between the vectors of probabilities (quantitative criterion). We perform a structural analysis on the latent semantic space using the Pretopology theory that allows us to investigate the role of the number of clusters and the chosen centroids, in the similarity between the computed clusters. We have applied our method to Twitter data and showed the accuracy of our results compared to a random choice number of clusters.</description><identifier>ISSN: 0350-5596</identifier><identifier>EISSN: 1854-3871</identifier><language>eng</language><publisher>Ljubljana: Slovenian Society Informatika / Slovensko drustvo Informatika</publisher><subject>Clustering ; Clusters ; Computation ; Criteria ; Mathematical analysis ; Semantics ; Structural analysis ; Vectors (mathematics)</subject><ispartof>Informatica (Ljubljana), 2016-06, Vol.40 (2), p.169-169</ispartof><rights>Copyright Slovenian Society Informatika / Slovensko drustvo Informatika Jun 2016</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.proquest.com/docview/1808031225/fulltextPDF?pq-origsite=primo$$EPDF$$P50$$Gproquest$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.proquest.com/docview/1808031225?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>314,776,780,25732,36991,36992,44569,74872</link.rule.ids></links><search><creatorcontrib>Bui, Quang Vu</creatorcontrib><creatorcontrib>Sayadi, Karim</creatorcontrib><creatorcontrib>Bui, Marc</creatorcontrib><title>A Multi-Criteria Document Clustering Method Based on Topic Modeling and Pseudoclosure Function</title><title>Informatica (Ljubljana)</title><description>We address in this work the problem of document clustering. Our contribution proposes a novel unsupervised clustering method based on the structural analysis of the latent semantic space. Each document in the space is a vector of probabilities that represents a distribution of topics. The document membership to a cluster is computed taking into account two criteria: the major topic in the document (qualitative criterion) and the distance measure between the vectors of probabilities (quantitative criterion). We perform a structural analysis on the latent semantic space using the Pretopology theory that allows us to investigate the role of the number of clusters and the chosen centroids, in the similarity between the computed clusters. We have applied our method to Twitter data and showed the accuracy of our results compared to a random choice number of clusters.</description><subject>Clustering</subject><subject>Clusters</subject><subject>Computation</subject><subject>Criteria</subject><subject>Mathematical analysis</subject><subject>Semantics</subject><subject>Structural analysis</subject><subject>Vectors (mathematics)</subject><issn>0350-5596</issn><issn>1854-3871</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2016</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><recordid>eNpdjsFKAzEURYMoWKv_EHDjZiDJm6SZZR2tCh10UbeWTPJGp6RJnST_b4uuXF0493C5Z2TGtawr0At-TmYMJKukbNQluUppx1gNXIsZ-VjSrvg8Vu00ZpxGQx-iLXsMmba-pBMKn7TD_BUdvTcJHY2BbuJhtLSLDv2pNsHRt4TFRetjKhPSVQk2jzFck4vB-IQ3fzkn76vHTftcrV-fXtrlujoIrnIloHa9aqy0HCxqBwuLQ6-htqaBRghWcyWN6hlnvB9g4D02wKFWKPtBSgVzcve7e5jid8GUt_sxWfTeBIwlbbkGqaRkgh3V23_qLpYpHN8dLaYZcCEk_ADCrV5W</recordid><startdate>20160601</startdate><enddate>20160601</enddate><creator>Bui, Quang Vu</creator><creator>Sayadi, Karim</creator><creator>Bui, Marc</creator><general>Slovenian Society Informatika / Slovensko drustvo Informatika</general><scope>3V.</scope><scope>7SC</scope><scope>7XB</scope><scope>8AL</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FK</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>BYOGL</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>M0N</scope><scope>P5Z</scope><scope>P62</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>Q9U</scope></search><sort><creationdate>20160601</creationdate><title>A Multi-Criteria Document Clustering Method Based on Topic Modeling and Pseudoclosure Function</title><author>Bui, Quang Vu ; Sayadi, Karim ; Bui, Marc</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-p216t-234db69c5c13ce8d37cefb834ca9392204165a6b0101bf3f1be931346e5bf5563</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2016</creationdate><topic>Clustering</topic><topic>Clusters</topic><topic>Computation</topic><topic>Criteria</topic><topic>Mathematical analysis</topic><topic>Semantics</topic><topic>Structural analysis</topic><topic>Vectors (mathematics)</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Bui, Quang Vu</creatorcontrib><creatorcontrib>Sayadi, Karim</creatorcontrib><creatorcontrib>Bui, Marc</creatorcontrib><collection>ProQuest Central (Corporate)</collection><collection>Computer and Information Systems Abstracts</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Computing Database (Alumni Edition)</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>Advanced Technologies &amp; Aerospace Database‎ (1962 - current)</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>East Europe, Central Europe Database</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Computing Database</collection><collection>Advanced Technologies &amp; Aerospace Database</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Collection</collection><collection>Publicly Available Content Database (Proquest) (PQ_SDU_P3)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>ProQuest Central Basic</collection><jtitle>Informatica (Ljubljana)</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Bui, Quang Vu</au><au>Sayadi, Karim</au><au>Bui, Marc</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A Multi-Criteria Document Clustering Method Based on Topic Modeling and Pseudoclosure Function</atitle><jtitle>Informatica (Ljubljana)</jtitle><date>2016-06-01</date><risdate>2016</risdate><volume>40</volume><issue>2</issue><spage>169</spage><epage>169</epage><pages>169-169</pages><issn>0350-5596</issn><eissn>1854-3871</eissn><abstract>We address in this work the problem of document clustering. Our contribution proposes a novel unsupervised clustering method based on the structural analysis of the latent semantic space. Each document in the space is a vector of probabilities that represents a distribution of topics. The document membership to a cluster is computed taking into account two criteria: the major topic in the document (qualitative criterion) and the distance measure between the vectors of probabilities (quantitative criterion). We perform a structural analysis on the latent semantic space using the Pretopology theory that allows us to investigate the role of the number of clusters and the chosen centroids, in the similarity between the computed clusters. We have applied our method to Twitter data and showed the accuracy of our results compared to a random choice number of clusters.</abstract><cop>Ljubljana</cop><pub>Slovenian Society Informatika / Slovensko drustvo Informatika</pub><tpages>1</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 0350-5596
ispartof Informatica (Ljubljana), 2016-06, Vol.40 (2), p.169-169
issn 0350-5596
1854-3871
language eng
recordid cdi_proquest_miscellaneous_1835655020
source Publicly Available Content Database (Proquest) (PQ_SDU_P3)
subjects Clustering
Clusters
Computation
Criteria
Mathematical analysis
Semantics
Structural analysis
Vectors (mathematics)
title A Multi-Criteria Document Clustering Method Based on Topic Modeling and Pseudoclosure Function
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-23T11%3A32%3A40IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20Multi-Criteria%20Document%20Clustering%20Method%20Based%20on%20Topic%20Modeling%20and%20Pseudoclosure%20Function&rft.jtitle=Informatica%20(Ljubljana)&rft.au=Bui,%20Quang%20Vu&rft.date=2016-06-01&rft.volume=40&rft.issue=2&rft.spage=169&rft.epage=169&rft.pages=169-169&rft.issn=0350-5596&rft.eissn=1854-3871&rft_id=info:doi/&rft_dat=%3Cproquest%3E4134683571%3C/proquest%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-p216t-234db69c5c13ce8d37cefb834ca9392204165a6b0101bf3f1be931346e5bf5563%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=1808031225&rft_id=info:pmid/&rfr_iscdi=true