Loading…
A Multi-Criteria Document Clustering Method Based on Topic Modeling and Pseudoclosure Function
We address in this work the problem of document clustering. Our contribution proposes a novel unsupervised clustering method based on the structural analysis of the latent semantic space. Each document in the space is a vector of probabilities that represents a distribution of topics. The document m...
Saved in:
Published in: | Informatica (Ljubljana) 2016-06, Vol.40 (2), p.169-169 |
---|---|
Main Authors: | , , |
Format: | Article |
Language: | English |
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | |
---|---|
cites | |
container_end_page | 169 |
container_issue | 2 |
container_start_page | 169 |
container_title | Informatica (Ljubljana) |
container_volume | 40 |
creator | Bui, Quang Vu Sayadi, Karim Bui, Marc |
description | We address in this work the problem of document clustering. Our contribution proposes a novel unsupervised clustering method based on the structural analysis of the latent semantic space. Each document in the space is a vector of probabilities that represents a distribution of topics. The document membership to a cluster is computed taking into account two criteria: the major topic in the document (qualitative criterion) and the distance measure between the vectors of probabilities (quantitative criterion). We perform a structural analysis on the latent semantic space using the Pretopology theory that allows us to investigate the role of the number of clusters and the chosen centroids, in the similarity between the computed clusters. We have applied our method to Twitter data and showed the accuracy of our results compared to a random choice number of clusters. |
format | article |
fullrecord | <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_miscellaneous_1835655020</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>4134683571</sourcerecordid><originalsourceid>FETCH-LOGICAL-p216t-234db69c5c13ce8d37cefb834ca9392204165a6b0101bf3f1be931346e5bf5563</originalsourceid><addsrcrecordid>eNpdjsFKAzEURYMoWKv_EHDjZiDJm6SZZR2tCh10UbeWTPJGp6RJnST_b4uuXF0493C5Z2TGtawr0At-TmYMJKukbNQluUppx1gNXIsZ-VjSrvg8Vu00ZpxGQx-iLXsMmba-pBMKn7TD_BUdvTcJHY2BbuJhtLSLDv2pNsHRt4TFRetjKhPSVQk2jzFck4vB-IQ3fzkn76vHTftcrV-fXtrlujoIrnIloHa9aqy0HCxqBwuLQ6-htqaBRghWcyWN6hlnvB9g4D02wKFWKPtBSgVzcve7e5jid8GUt_sxWfTeBIwlbbkGqaRkgh3V23_qLpYpHN8dLaYZcCEk_ADCrV5W</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1808031225</pqid></control><display><type>article</type><title>A Multi-Criteria Document Clustering Method Based on Topic Modeling and Pseudoclosure Function</title><source>Publicly Available Content Database (Proquest) (PQ_SDU_P3)</source><creator>Bui, Quang Vu ; Sayadi, Karim ; Bui, Marc</creator><creatorcontrib>Bui, Quang Vu ; Sayadi, Karim ; Bui, Marc</creatorcontrib><description>We address in this work the problem of document clustering. Our contribution proposes a novel unsupervised clustering method based on the structural analysis of the latent semantic space. Each document in the space is a vector of probabilities that represents a distribution of topics. The document membership to a cluster is computed taking into account two criteria: the major topic in the document (qualitative criterion) and the distance measure between the vectors of probabilities (quantitative criterion). We perform a structural analysis on the latent semantic space using the Pretopology theory that allows us to investigate the role of the number of clusters and the chosen centroids, in the similarity between the computed clusters. We have applied our method to Twitter data and showed the accuracy of our results compared to a random choice number of clusters.</description><identifier>ISSN: 0350-5596</identifier><identifier>EISSN: 1854-3871</identifier><language>eng</language><publisher>Ljubljana: Slovenian Society Informatika / Slovensko drustvo Informatika</publisher><subject>Clustering ; Clusters ; Computation ; Criteria ; Mathematical analysis ; Semantics ; Structural analysis ; Vectors (mathematics)</subject><ispartof>Informatica (Ljubljana), 2016-06, Vol.40 (2), p.169-169</ispartof><rights>Copyright Slovenian Society Informatika / Slovensko drustvo Informatika Jun 2016</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.proquest.com/docview/1808031225/fulltextPDF?pq-origsite=primo$$EPDF$$P50$$Gproquest$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.proquest.com/docview/1808031225?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>314,776,780,25732,36991,36992,44569,74872</link.rule.ids></links><search><creatorcontrib>Bui, Quang Vu</creatorcontrib><creatorcontrib>Sayadi, Karim</creatorcontrib><creatorcontrib>Bui, Marc</creatorcontrib><title>A Multi-Criteria Document Clustering Method Based on Topic Modeling and Pseudoclosure Function</title><title>Informatica (Ljubljana)</title><description>We address in this work the problem of document clustering. Our contribution proposes a novel unsupervised clustering method based on the structural analysis of the latent semantic space. Each document in the space is a vector of probabilities that represents a distribution of topics. The document membership to a cluster is computed taking into account two criteria: the major topic in the document (qualitative criterion) and the distance measure between the vectors of probabilities (quantitative criterion). We perform a structural analysis on the latent semantic space using the Pretopology theory that allows us to investigate the role of the number of clusters and the chosen centroids, in the similarity between the computed clusters. We have applied our method to Twitter data and showed the accuracy of our results compared to a random choice number of clusters.</description><subject>Clustering</subject><subject>Clusters</subject><subject>Computation</subject><subject>Criteria</subject><subject>Mathematical analysis</subject><subject>Semantics</subject><subject>Structural analysis</subject><subject>Vectors (mathematics)</subject><issn>0350-5596</issn><issn>1854-3871</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2016</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><recordid>eNpdjsFKAzEURYMoWKv_EHDjZiDJm6SZZR2tCh10UbeWTPJGp6RJnST_b4uuXF0493C5Z2TGtawr0At-TmYMJKukbNQluUppx1gNXIsZ-VjSrvg8Vu00ZpxGQx-iLXsMmba-pBMKn7TD_BUdvTcJHY2BbuJhtLSLDv2pNsHRt4TFRetjKhPSVQk2jzFck4vB-IQ3fzkn76vHTftcrV-fXtrlujoIrnIloHa9aqy0HCxqBwuLQ6-htqaBRghWcyWN6hlnvB9g4D02wKFWKPtBSgVzcve7e5jid8GUt_sxWfTeBIwlbbkGqaRkgh3V23_qLpYpHN8dLaYZcCEk_ADCrV5W</recordid><startdate>20160601</startdate><enddate>20160601</enddate><creator>Bui, Quang Vu</creator><creator>Sayadi, Karim</creator><creator>Bui, Marc</creator><general>Slovenian Society Informatika / Slovensko drustvo Informatika</general><scope>3V.</scope><scope>7SC</scope><scope>7XB</scope><scope>8AL</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FK</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>BYOGL</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>M0N</scope><scope>P5Z</scope><scope>P62</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>Q9U</scope></search><sort><creationdate>20160601</creationdate><title>A Multi-Criteria Document Clustering Method Based on Topic Modeling and Pseudoclosure Function</title><author>Bui, Quang Vu ; Sayadi, Karim ; Bui, Marc</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-p216t-234db69c5c13ce8d37cefb834ca9392204165a6b0101bf3f1be931346e5bf5563</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2016</creationdate><topic>Clustering</topic><topic>Clusters</topic><topic>Computation</topic><topic>Criteria</topic><topic>Mathematical analysis</topic><topic>Semantics</topic><topic>Structural analysis</topic><topic>Vectors (mathematics)</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Bui, Quang Vu</creatorcontrib><creatorcontrib>Sayadi, Karim</creatorcontrib><creatorcontrib>Bui, Marc</creatorcontrib><collection>ProQuest Central (Corporate)</collection><collection>Computer and Information Systems Abstracts</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Computing Database (Alumni Edition)</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>Advanced Technologies & Aerospace Database (1962 - current)</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>East Europe, Central Europe Database</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Computing Database</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>Publicly Available Content Database (Proquest) (PQ_SDU_P3)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>ProQuest Central Basic</collection><jtitle>Informatica (Ljubljana)</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Bui, Quang Vu</au><au>Sayadi, Karim</au><au>Bui, Marc</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A Multi-Criteria Document Clustering Method Based on Topic Modeling and Pseudoclosure Function</atitle><jtitle>Informatica (Ljubljana)</jtitle><date>2016-06-01</date><risdate>2016</risdate><volume>40</volume><issue>2</issue><spage>169</spage><epage>169</epage><pages>169-169</pages><issn>0350-5596</issn><eissn>1854-3871</eissn><abstract>We address in this work the problem of document clustering. Our contribution proposes a novel unsupervised clustering method based on the structural analysis of the latent semantic space. Each document in the space is a vector of probabilities that represents a distribution of topics. The document membership to a cluster is computed taking into account two criteria: the major topic in the document (qualitative criterion) and the distance measure between the vectors of probabilities (quantitative criterion). We perform a structural analysis on the latent semantic space using the Pretopology theory that allows us to investigate the role of the number of clusters and the chosen centroids, in the similarity between the computed clusters. We have applied our method to Twitter data and showed the accuracy of our results compared to a random choice number of clusters.</abstract><cop>Ljubljana</cop><pub>Slovenian Society Informatika / Slovensko drustvo Informatika</pub><tpages>1</tpages><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0350-5596 |
ispartof | Informatica (Ljubljana), 2016-06, Vol.40 (2), p.169-169 |
issn | 0350-5596 1854-3871 |
language | eng |
recordid | cdi_proquest_miscellaneous_1835655020 |
source | Publicly Available Content Database (Proquest) (PQ_SDU_P3) |
subjects | Clustering Clusters Computation Criteria Mathematical analysis Semantics Structural analysis Vectors (mathematics) |
title | A Multi-Criteria Document Clustering Method Based on Topic Modeling and Pseudoclosure Function |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-23T11%3A32%3A40IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20Multi-Criteria%20Document%20Clustering%20Method%20Based%20on%20Topic%20Modeling%20and%20Pseudoclosure%20Function&rft.jtitle=Informatica%20(Ljubljana)&rft.au=Bui,%20Quang%20Vu&rft.date=2016-06-01&rft.volume=40&rft.issue=2&rft.spage=169&rft.epage=169&rft.pages=169-169&rft.issn=0350-5596&rft.eissn=1854-3871&rft_id=info:doi/&rft_dat=%3Cproquest%3E4134683571%3C/proquest%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-p216t-234db69c5c13ce8d37cefb834ca9392204165a6b0101bf3f1be931346e5bf5563%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=1808031225&rft_id=info:pmid/&rfr_iscdi=true |