Loading…

A distributed parallel algorithm for inferring hierarchical groups from large‐scale text corpuses

Summary We propose a distributed parallel algorithm for inferring the hierarchical groups present in a large‐scale text corpus. The algorithm is designed to deal with corpuses that typically do not fit into the main memory of a workstation computer. The key contribution of this paper lies in its pro...

Full description

Saved in:
Bibliographic Details
Published in:Concurrency and computation 2018-06, Vol.30 (11), p.n/a
Main Authors: Seshadri, Karthick, S. Mercy, Shalinie, Manohar, Sidharth
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c2934-c898efaf6f8a4558a1079200af5eb3386b981f7b0fc543e32d0ec2d25492e2223
cites cdi_FETCH-LOGICAL-c2934-c898efaf6f8a4558a1079200af5eb3386b981f7b0fc543e32d0ec2d25492e2223
container_end_page n/a
container_issue 11
container_start_page
container_title Concurrency and computation
container_volume 30
creator Seshadri, Karthick
S. Mercy, Shalinie
Manohar, Sidharth
description Summary We propose a distributed parallel algorithm for inferring the hierarchical groups present in a large‐scale text corpus. The algorithm is designed to deal with corpuses that typically do not fit into the main memory of a workstation computer. The key contribution of this paper lies in its proposal and verification of a parallel distributed algorithm that exploits the advantages of two complementary techniques based on (i) localized modularity optimization and (ii) spectral clustering. Based on our experimental observations, these are complementary in the sense that the former excels at finding coarse groups in a large‐scale network, while the latter demands a heavy memory footprint but is effective in inferring tightly knit fine‐grained groups. Empirical evaluation of the distributed implementation scheme shows that the algorithm exhibits a significant speed‐up when compared to existing algorithms like Louvain and, at the same time, produces better quality clusters than either Louvain or spectral clustering algorithms in terms of the F‐score and Rand index.
doi_str_mv 10.1002/cpe.4404
format article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2033713881</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2033713881</sourcerecordid><originalsourceid>FETCH-LOGICAL-c2934-c898efaf6f8a4558a1079200af5eb3386b981f7b0fc543e32d0ec2d25492e2223</originalsourceid><addsrcrecordid>eNp10M1Kw0AQB_AgCtYq-AgLXryk7lfSzbGU-gEFPeg5bDaz6ZZtN84maG8-gs_ok5ha8eZpBubHf-CfJJeMThil_Ma0MJGSyqNkxDLBU5oLefy38_w0OYtxTSljVLBRYmakdrFDV_Ud1KTVqL0HT7RvArputSE2IHFbC4hu25CVA9RoVs5oTxoMfRuJxbAhXmMDXx-fcTgA6eC9IyZg20eI58mJ1T7Cxe8cJy-3i-f5fbp8vHuYz5ap4YWQqVGFAqttbpWWWaY0o9OCU6ptBpUQKq8Kxey0otZkUoDgNQXDa57JggPnXIyTq0Nui-G1h9iV69DjdnhZcirElAml2KCuD8pgiBHBli26jcZdyWi5r7AcKiz3FQ40PdA352H3ryvnT4sf_w0ODHPx</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2033713881</pqid></control><display><type>article</type><title>A distributed parallel algorithm for inferring hierarchical groups from large‐scale text corpuses</title><source>Wiley-Blackwell Read &amp; Publish Collection</source><creator>Seshadri, Karthick ; S. Mercy, Shalinie ; Manohar, Sidharth</creator><creatorcontrib>Seshadri, Karthick ; S. Mercy, Shalinie ; Manohar, Sidharth</creatorcontrib><description>Summary We propose a distributed parallel algorithm for inferring the hierarchical groups present in a large‐scale text corpus. The algorithm is designed to deal with corpuses that typically do not fit into the main memory of a workstation computer. The key contribution of this paper lies in its proposal and verification of a parallel distributed algorithm that exploits the advantages of two complementary techniques based on (i) localized modularity optimization and (ii) spectral clustering. Based on our experimental observations, these are complementary in the sense that the former excels at finding coarse groups in a large‐scale network, while the latter demands a heavy memory footprint but is effective in inferring tightly knit fine‐grained groups. Empirical evaluation of the distributed implementation scheme shows that the algorithm exhibits a significant speed‐up when compared to existing algorithms like Louvain and, at the same time, produces better quality clusters than either Louvain or spectral clustering algorithms in terms of the F‐score and Rand index.</description><identifier>ISSN: 1532-0626</identifier><identifier>EISSN: 1532-0634</identifier><identifier>DOI: 10.1002/cpe.4404</identifier><language>eng</language><publisher>Hoboken: Wiley Subscription Services, Inc</publisher><subject>Algorithms ; Clustering ; Computer memory ; distributed algorithm ; hierarchical clustering ; large‐scale clustering ; message passing interface ; Modularity ; spectral clustering ; text clustering</subject><ispartof>Concurrency and computation, 2018-06, Vol.30 (11), p.n/a</ispartof><rights>Copyright © 2017 John Wiley &amp; Sons, Ltd.</rights><rights>Copyright © 2018 John Wiley &amp; Sons, Ltd.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c2934-c898efaf6f8a4558a1079200af5eb3386b981f7b0fc543e32d0ec2d25492e2223</citedby><cites>FETCH-LOGICAL-c2934-c898efaf6f8a4558a1079200af5eb3386b981f7b0fc543e32d0ec2d25492e2223</cites><orcidid>0000-0002-5658-141X</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,27901,27902</link.rule.ids></links><search><creatorcontrib>Seshadri, Karthick</creatorcontrib><creatorcontrib>S. Mercy, Shalinie</creatorcontrib><creatorcontrib>Manohar, Sidharth</creatorcontrib><title>A distributed parallel algorithm for inferring hierarchical groups from large‐scale text corpuses</title><title>Concurrency and computation</title><description>Summary We propose a distributed parallel algorithm for inferring the hierarchical groups present in a large‐scale text corpus. The algorithm is designed to deal with corpuses that typically do not fit into the main memory of a workstation computer. The key contribution of this paper lies in its proposal and verification of a parallel distributed algorithm that exploits the advantages of two complementary techniques based on (i) localized modularity optimization and (ii) spectral clustering. Based on our experimental observations, these are complementary in the sense that the former excels at finding coarse groups in a large‐scale network, while the latter demands a heavy memory footprint but is effective in inferring tightly knit fine‐grained groups. Empirical evaluation of the distributed implementation scheme shows that the algorithm exhibits a significant speed‐up when compared to existing algorithms like Louvain and, at the same time, produces better quality clusters than either Louvain or spectral clustering algorithms in terms of the F‐score and Rand index.</description><subject>Algorithms</subject><subject>Clustering</subject><subject>Computer memory</subject><subject>distributed algorithm</subject><subject>hierarchical clustering</subject><subject>large‐scale clustering</subject><subject>message passing interface</subject><subject>Modularity</subject><subject>spectral clustering</subject><subject>text clustering</subject><issn>1532-0626</issn><issn>1532-0634</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><recordid>eNp10M1Kw0AQB_AgCtYq-AgLXryk7lfSzbGU-gEFPeg5bDaz6ZZtN84maG8-gs_ok5ha8eZpBubHf-CfJJeMThil_Ma0MJGSyqNkxDLBU5oLefy38_w0OYtxTSljVLBRYmakdrFDV_Ud1KTVqL0HT7RvArputSE2IHFbC4hu25CVA9RoVs5oTxoMfRuJxbAhXmMDXx-fcTgA6eC9IyZg20eI58mJ1T7Cxe8cJy-3i-f5fbp8vHuYz5ap4YWQqVGFAqttbpWWWaY0o9OCU6ptBpUQKq8Kxey0otZkUoDgNQXDa57JggPnXIyTq0Nui-G1h9iV69DjdnhZcirElAml2KCuD8pgiBHBli26jcZdyWi5r7AcKiz3FQ40PdA352H3ryvnT4sf_w0ODHPx</recordid><startdate>20180610</startdate><enddate>20180610</enddate><creator>Seshadri, Karthick</creator><creator>S. Mercy, Shalinie</creator><creator>Manohar, Sidharth</creator><general>Wiley Subscription Services, Inc</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0002-5658-141X</orcidid></search><sort><creationdate>20180610</creationdate><title>A distributed parallel algorithm for inferring hierarchical groups from large‐scale text corpuses</title><author>Seshadri, Karthick ; S. Mercy, Shalinie ; Manohar, Sidharth</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c2934-c898efaf6f8a4558a1079200af5eb3386b981f7b0fc543e32d0ec2d25492e2223</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2018</creationdate><topic>Algorithms</topic><topic>Clustering</topic><topic>Computer memory</topic><topic>distributed algorithm</topic><topic>hierarchical clustering</topic><topic>large‐scale clustering</topic><topic>message passing interface</topic><topic>Modularity</topic><topic>spectral clustering</topic><topic>text clustering</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Seshadri, Karthick</creatorcontrib><creatorcontrib>S. Mercy, Shalinie</creatorcontrib><creatorcontrib>Manohar, Sidharth</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Concurrency and computation</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Seshadri, Karthick</au><au>S. Mercy, Shalinie</au><au>Manohar, Sidharth</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A distributed parallel algorithm for inferring hierarchical groups from large‐scale text corpuses</atitle><jtitle>Concurrency and computation</jtitle><date>2018-06-10</date><risdate>2018</risdate><volume>30</volume><issue>11</issue><epage>n/a</epage><issn>1532-0626</issn><eissn>1532-0634</eissn><abstract>Summary We propose a distributed parallel algorithm for inferring the hierarchical groups present in a large‐scale text corpus. The algorithm is designed to deal with corpuses that typically do not fit into the main memory of a workstation computer. The key contribution of this paper lies in its proposal and verification of a parallel distributed algorithm that exploits the advantages of two complementary techniques based on (i) localized modularity optimization and (ii) spectral clustering. Based on our experimental observations, these are complementary in the sense that the former excels at finding coarse groups in a large‐scale network, while the latter demands a heavy memory footprint but is effective in inferring tightly knit fine‐grained groups. Empirical evaluation of the distributed implementation scheme shows that the algorithm exhibits a significant speed‐up when compared to existing algorithms like Louvain and, at the same time, produces better quality clusters than either Louvain or spectral clustering algorithms in terms of the F‐score and Rand index.</abstract><cop>Hoboken</cop><pub>Wiley Subscription Services, Inc</pub><doi>10.1002/cpe.4404</doi><tpages>1</tpages><orcidid>https://orcid.org/0000-0002-5658-141X</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 1532-0626
ispartof Concurrency and computation, 2018-06, Vol.30 (11), p.n/a
issn 1532-0626
1532-0634
language eng
recordid cdi_proquest_journals_2033713881
source Wiley-Blackwell Read & Publish Collection
subjects Algorithms
Clustering
Computer memory
distributed algorithm
hierarchical clustering
large‐scale clustering
message passing interface
Modularity
spectral clustering
text clustering
title A distributed parallel algorithm for inferring hierarchical groups from large‐scale text corpuses
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-09T22%3A27%3A03IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20distributed%20parallel%20algorithm%20for%20inferring%20hierarchical%20groups%20from%20large%E2%80%90scale%20text%20corpuses&rft.jtitle=Concurrency%20and%20computation&rft.au=Seshadri,%20Karthick&rft.date=2018-06-10&rft.volume=30&rft.issue=11&rft.epage=n/a&rft.issn=1532-0626&rft.eissn=1532-0634&rft_id=info:doi/10.1002/cpe.4404&rft_dat=%3Cproquest_cross%3E2033713881%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c2934-c898efaf6f8a4558a1079200af5eb3386b981f7b0fc543e32d0ec2d25492e2223%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2033713881&rft_id=info:pmid/&rfr_iscdi=true