Loading…
A distributed parallel algorithm for inferring hierarchical groups from large‐scale text corpuses
Summary We propose a distributed parallel algorithm for inferring the hierarchical groups present in a large‐scale text corpus. The algorithm is designed to deal with corpuses that typically do not fit into the main memory of a workstation computer. The key contribution of this paper lies in its pro...
Saved in:
Published in: | Concurrency and computation 2018-06, Vol.30 (11), p.n/a |
---|---|
Main Authors: | , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | cdi_FETCH-LOGICAL-c2934-c898efaf6f8a4558a1079200af5eb3386b981f7b0fc543e32d0ec2d25492e2223 |
---|---|
cites | cdi_FETCH-LOGICAL-c2934-c898efaf6f8a4558a1079200af5eb3386b981f7b0fc543e32d0ec2d25492e2223 |
container_end_page | n/a |
container_issue | 11 |
container_start_page | |
container_title | Concurrency and computation |
container_volume | 30 |
creator | Seshadri, Karthick S. Mercy, Shalinie Manohar, Sidharth |
description | Summary
We propose a distributed parallel algorithm for inferring the hierarchical groups present in a large‐scale text corpus. The algorithm is designed to deal with corpuses that typically do not fit into the main memory of a workstation computer. The key contribution of this paper lies in its proposal and verification of a parallel distributed algorithm that exploits the advantages of two complementary techniques based on (i) localized modularity optimization and (ii) spectral clustering. Based on our experimental observations, these are complementary in the sense that the former excels at finding coarse groups in a large‐scale network, while the latter demands a heavy memory footprint but is effective in inferring tightly knit fine‐grained groups. Empirical evaluation of the distributed implementation scheme shows that the algorithm exhibits a significant speed‐up when compared to existing algorithms like Louvain and, at the same time, produces better quality clusters than either Louvain or spectral clustering algorithms in terms of the F‐score and Rand index. |
doi_str_mv | 10.1002/cpe.4404 |
format | article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2033713881</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2033713881</sourcerecordid><originalsourceid>FETCH-LOGICAL-c2934-c898efaf6f8a4558a1079200af5eb3386b981f7b0fc543e32d0ec2d25492e2223</originalsourceid><addsrcrecordid>eNp10M1Kw0AQB_AgCtYq-AgLXryk7lfSzbGU-gEFPeg5bDaz6ZZtN84maG8-gs_ok5ha8eZpBubHf-CfJJeMThil_Ma0MJGSyqNkxDLBU5oLefy38_w0OYtxTSljVLBRYmakdrFDV_Ud1KTVqL0HT7RvArputSE2IHFbC4hu25CVA9RoVs5oTxoMfRuJxbAhXmMDXx-fcTgA6eC9IyZg20eI58mJ1T7Cxe8cJy-3i-f5fbp8vHuYz5ap4YWQqVGFAqttbpWWWaY0o9OCU6ptBpUQKq8Kxey0otZkUoDgNQXDa57JggPnXIyTq0Nui-G1h9iV69DjdnhZcirElAml2KCuD8pgiBHBli26jcZdyWi5r7AcKiz3FQ40PdA352H3ryvnT4sf_w0ODHPx</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2033713881</pqid></control><display><type>article</type><title>A distributed parallel algorithm for inferring hierarchical groups from large‐scale text corpuses</title><source>Wiley-Blackwell Read & Publish Collection</source><creator>Seshadri, Karthick ; S. Mercy, Shalinie ; Manohar, Sidharth</creator><creatorcontrib>Seshadri, Karthick ; S. Mercy, Shalinie ; Manohar, Sidharth</creatorcontrib><description>Summary
We propose a distributed parallel algorithm for inferring the hierarchical groups present in a large‐scale text corpus. The algorithm is designed to deal with corpuses that typically do not fit into the main memory of a workstation computer. The key contribution of this paper lies in its proposal and verification of a parallel distributed algorithm that exploits the advantages of two complementary techniques based on (i) localized modularity optimization and (ii) spectral clustering. Based on our experimental observations, these are complementary in the sense that the former excels at finding coarse groups in a large‐scale network, while the latter demands a heavy memory footprint but is effective in inferring tightly knit fine‐grained groups. Empirical evaluation of the distributed implementation scheme shows that the algorithm exhibits a significant speed‐up when compared to existing algorithms like Louvain and, at the same time, produces better quality clusters than either Louvain or spectral clustering algorithms in terms of the F‐score and Rand index.</description><identifier>ISSN: 1532-0626</identifier><identifier>EISSN: 1532-0634</identifier><identifier>DOI: 10.1002/cpe.4404</identifier><language>eng</language><publisher>Hoboken: Wiley Subscription Services, Inc</publisher><subject>Algorithms ; Clustering ; Computer memory ; distributed algorithm ; hierarchical clustering ; large‐scale clustering ; message passing interface ; Modularity ; spectral clustering ; text clustering</subject><ispartof>Concurrency and computation, 2018-06, Vol.30 (11), p.n/a</ispartof><rights>Copyright © 2017 John Wiley & Sons, Ltd.</rights><rights>Copyright © 2018 John Wiley & Sons, Ltd.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c2934-c898efaf6f8a4558a1079200af5eb3386b981f7b0fc543e32d0ec2d25492e2223</citedby><cites>FETCH-LOGICAL-c2934-c898efaf6f8a4558a1079200af5eb3386b981f7b0fc543e32d0ec2d25492e2223</cites><orcidid>0000-0002-5658-141X</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,27901,27902</link.rule.ids></links><search><creatorcontrib>Seshadri, Karthick</creatorcontrib><creatorcontrib>S. Mercy, Shalinie</creatorcontrib><creatorcontrib>Manohar, Sidharth</creatorcontrib><title>A distributed parallel algorithm for inferring hierarchical groups from large‐scale text corpuses</title><title>Concurrency and computation</title><description>Summary
We propose a distributed parallel algorithm for inferring the hierarchical groups present in a large‐scale text corpus. The algorithm is designed to deal with corpuses that typically do not fit into the main memory of a workstation computer. The key contribution of this paper lies in its proposal and verification of a parallel distributed algorithm that exploits the advantages of two complementary techniques based on (i) localized modularity optimization and (ii) spectral clustering. Based on our experimental observations, these are complementary in the sense that the former excels at finding coarse groups in a large‐scale network, while the latter demands a heavy memory footprint but is effective in inferring tightly knit fine‐grained groups. Empirical evaluation of the distributed implementation scheme shows that the algorithm exhibits a significant speed‐up when compared to existing algorithms like Louvain and, at the same time, produces better quality clusters than either Louvain or spectral clustering algorithms in terms of the F‐score and Rand index.</description><subject>Algorithms</subject><subject>Clustering</subject><subject>Computer memory</subject><subject>distributed algorithm</subject><subject>hierarchical clustering</subject><subject>large‐scale clustering</subject><subject>message passing interface</subject><subject>Modularity</subject><subject>spectral clustering</subject><subject>text clustering</subject><issn>1532-0626</issn><issn>1532-0634</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><recordid>eNp10M1Kw0AQB_AgCtYq-AgLXryk7lfSzbGU-gEFPeg5bDaz6ZZtN84maG8-gs_ok5ha8eZpBubHf-CfJJeMThil_Ma0MJGSyqNkxDLBU5oLefy38_w0OYtxTSljVLBRYmakdrFDV_Ud1KTVqL0HT7RvArputSE2IHFbC4hu25CVA9RoVs5oTxoMfRuJxbAhXmMDXx-fcTgA6eC9IyZg20eI58mJ1T7Cxe8cJy-3i-f5fbp8vHuYz5ap4YWQqVGFAqttbpWWWaY0o9OCU6ptBpUQKq8Kxey0otZkUoDgNQXDa57JggPnXIyTq0Nui-G1h9iV69DjdnhZcirElAml2KCuD8pgiBHBli26jcZdyWi5r7AcKiz3FQ40PdA352H3ryvnT4sf_w0ODHPx</recordid><startdate>20180610</startdate><enddate>20180610</enddate><creator>Seshadri, Karthick</creator><creator>S. Mercy, Shalinie</creator><creator>Manohar, Sidharth</creator><general>Wiley Subscription Services, Inc</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0002-5658-141X</orcidid></search><sort><creationdate>20180610</creationdate><title>A distributed parallel algorithm for inferring hierarchical groups from large‐scale text corpuses</title><author>Seshadri, Karthick ; S. Mercy, Shalinie ; Manohar, Sidharth</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c2934-c898efaf6f8a4558a1079200af5eb3386b981f7b0fc543e32d0ec2d25492e2223</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2018</creationdate><topic>Algorithms</topic><topic>Clustering</topic><topic>Computer memory</topic><topic>distributed algorithm</topic><topic>hierarchical clustering</topic><topic>large‐scale clustering</topic><topic>message passing interface</topic><topic>Modularity</topic><topic>spectral clustering</topic><topic>text clustering</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Seshadri, Karthick</creatorcontrib><creatorcontrib>S. Mercy, Shalinie</creatorcontrib><creatorcontrib>Manohar, Sidharth</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Concurrency and computation</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Seshadri, Karthick</au><au>S. Mercy, Shalinie</au><au>Manohar, Sidharth</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A distributed parallel algorithm for inferring hierarchical groups from large‐scale text corpuses</atitle><jtitle>Concurrency and computation</jtitle><date>2018-06-10</date><risdate>2018</risdate><volume>30</volume><issue>11</issue><epage>n/a</epage><issn>1532-0626</issn><eissn>1532-0634</eissn><abstract>Summary
We propose a distributed parallel algorithm for inferring the hierarchical groups present in a large‐scale text corpus. The algorithm is designed to deal with corpuses that typically do not fit into the main memory of a workstation computer. The key contribution of this paper lies in its proposal and verification of a parallel distributed algorithm that exploits the advantages of two complementary techniques based on (i) localized modularity optimization and (ii) spectral clustering. Based on our experimental observations, these are complementary in the sense that the former excels at finding coarse groups in a large‐scale network, while the latter demands a heavy memory footprint but is effective in inferring tightly knit fine‐grained groups. Empirical evaluation of the distributed implementation scheme shows that the algorithm exhibits a significant speed‐up when compared to existing algorithms like Louvain and, at the same time, produces better quality clusters than either Louvain or spectral clustering algorithms in terms of the F‐score and Rand index.</abstract><cop>Hoboken</cop><pub>Wiley Subscription Services, Inc</pub><doi>10.1002/cpe.4404</doi><tpages>1</tpages><orcidid>https://orcid.org/0000-0002-5658-141X</orcidid></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1532-0626 |
ispartof | Concurrency and computation, 2018-06, Vol.30 (11), p.n/a |
issn | 1532-0626 1532-0634 |
language | eng |
recordid | cdi_proquest_journals_2033713881 |
source | Wiley-Blackwell Read & Publish Collection |
subjects | Algorithms Clustering Computer memory distributed algorithm hierarchical clustering large‐scale clustering message passing interface Modularity spectral clustering text clustering |
title | A distributed parallel algorithm for inferring hierarchical groups from large‐scale text corpuses |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-09T22%3A27%3A03IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20distributed%20parallel%20algorithm%20for%20inferring%20hierarchical%20groups%20from%20large%E2%80%90scale%20text%20corpuses&rft.jtitle=Concurrency%20and%20computation&rft.au=Seshadri,%20Karthick&rft.date=2018-06-10&rft.volume=30&rft.issue=11&rft.epage=n/a&rft.issn=1532-0626&rft.eissn=1532-0634&rft_id=info:doi/10.1002/cpe.4404&rft_dat=%3Cproquest_cross%3E2033713881%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c2934-c898efaf6f8a4558a1079200af5eb3386b981f7b0fc543e32d0ec2d25492e2223%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2033713881&rft_id=info:pmid/&rfr_iscdi=true |