Loading…
A distributed vertex-centric approach for pattern matching in massive graphs
Graph pattern matching is fundamentally important to many applications such as analyzing hyper-links in the World Wide Web, mining associations in online social networks, and substructure search in biochemistry. Most existing graph pattern matching algorithms are highly computation intensive, and do...
Saved in:
Main Authors: | , , , , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | |
---|---|
cites | |
container_end_page | 411 |
container_issue | |
container_start_page | 403 |
container_title | |
container_volume | |
creator | Fard, Arash Nisar, M. Usman Ramaswamy, Lakshmish Miller, John A. Saltz, Matthew |
description | Graph pattern matching is fundamentally important to many applications such as analyzing hyper-links in the World Wide Web, mining associations in online social networks, and substructure search in biochemistry. Most existing graph pattern matching algorithms are highly computation intensive, and do not scale to extremely large graphs that characterize many emerging applications. In recent years, graph processing frameworks such as Pregel have sought to harness shared nothing clusters for processing massive graphs through a vertex-centric, Bulk Synchronous Parallel (BSP) programming model. However, developing scalable and efficient BSP-based algorithms for pattern matching is very challenging because this problem does not naturally align with a vertex-centric programming paradigm. This paper presents novel distributed algorithms based on the vertex-centric programming paradigm for a set of pattern matching models, namely, graph simulation, dual simulation and strong simulation. Our algorithms are fine-tuned to consider the challenges of pattern matching on massive data graphs. Furthermore, we introduce a new pattern matching model, called strict simulation, which outperforms strong simulation in terms of scalability while preserving its important properties. We investigate potential performance bottlenecks and propose several techniques to mitigate them. This paper also presents an extensive set of experiments involving massive graphs (millions of vertices and billions of edges) to study the effects of various parameters on the scalability and performance of the proposed algorithms. The results demonstrate that our techniques are highly effective in alleviating performance bottlenecks and yield significant scalability benefits. |
doi_str_mv | 10.1109/BigData.2013.6691601 |
format | conference_proceeding |
fullrecord | <record><control><sourceid>ieee_6IE</sourceid><recordid>TN_cdi_ieee_primary_6691601</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>6691601</ieee_id><sourcerecordid>6691601</sourcerecordid><originalsourceid>FETCH-LOGICAL-i175t-2bba218495c0646b1712756fe32f2cba3683ccea93f928f325c26554c24c372c3</originalsourceid><addsrcrecordid>eNotj71OwzAURs2ABJQ-AQx-gQRf_8ZjKVCQIrGAxFY5tzepEU0j21Tw9oDo9B2d4UgfY9cgagDhb27jcBdKqKUAVVvrwQo4YRegnfcgvXo7Y_Oc34UQ4JzRTp-zdsE3MZcUu89CG36gVOirQhp_FfIwTWkfcMv7feJTKIXSyHeh4DaOA49_nHM8EB9SmLb5kp324SPT_Lgz9vpw_7J8rNrn1dNy0VYRnCmV7LogodHeoLDaduBAOmN7UrKX2AVlG4VIwavey6ZX0qC0xmiUGpWTqGbs6r8biWg9pbgL6Xt9PKx-ALmpTX0</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>A distributed vertex-centric approach for pattern matching in massive graphs</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Fard, Arash ; Nisar, M. Usman ; Ramaswamy, Lakshmish ; Miller, John A. ; Saltz, Matthew</creator><creatorcontrib>Fard, Arash ; Nisar, M. Usman ; Ramaswamy, Lakshmish ; Miller, John A. ; Saltz, Matthew</creatorcontrib><description>Graph pattern matching is fundamentally important to many applications such as analyzing hyper-links in the World Wide Web, mining associations in online social networks, and substructure search in biochemistry. Most existing graph pattern matching algorithms are highly computation intensive, and do not scale to extremely large graphs that characterize many emerging applications. In recent years, graph processing frameworks such as Pregel have sought to harness shared nothing clusters for processing massive graphs through a vertex-centric, Bulk Synchronous Parallel (BSP) programming model. However, developing scalable and efficient BSP-based algorithms for pattern matching is very challenging because this problem does not naturally align with a vertex-centric programming paradigm. This paper presents novel distributed algorithms based on the vertex-centric programming paradigm for a set of pattern matching models, namely, graph simulation, dual simulation and strong simulation. Our algorithms are fine-tuned to consider the challenges of pattern matching on massive data graphs. Furthermore, we introduce a new pattern matching model, called strict simulation, which outperforms strong simulation in terms of scalability while preserving its important properties. We investigate potential performance bottlenecks and propose several techniques to mitigate them. This paper also presents an extensive set of experiments involving massive graphs (millions of vertices and billions of edges) to study the effects of various parameters on the scalability and performance of the proposed algorithms. The results demonstrate that our techniques are highly effective in alleviating performance bottlenecks and yield significant scalability benefits.</description><identifier>EISBN: 147991293X</identifier><identifier>EISBN: 9781479912933</identifier><identifier>DOI: 10.1109/BigData.2013.6691601</identifier><language>eng</language><publisher>IEEE</publisher><subject>Algorithm design and analysis ; Clustering algorithms ; Computational modeling ; Data models ; data-intensive computing ; Distributed algorithms ; graph simulation ; Pattern matching ; Programming ; query graphs ; subgraph isomorphism</subject><ispartof>2013 IEEE International Conference on Big Data, 2013, p.403-411</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/6691601$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,2058,27925,54920</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/6691601$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Fard, Arash</creatorcontrib><creatorcontrib>Nisar, M. Usman</creatorcontrib><creatorcontrib>Ramaswamy, Lakshmish</creatorcontrib><creatorcontrib>Miller, John A.</creatorcontrib><creatorcontrib>Saltz, Matthew</creatorcontrib><title>A distributed vertex-centric approach for pattern matching in massive graphs</title><title>2013 IEEE International Conference on Big Data</title><addtitle>BigData</addtitle><description>Graph pattern matching is fundamentally important to many applications such as analyzing hyper-links in the World Wide Web, mining associations in online social networks, and substructure search in biochemistry. Most existing graph pattern matching algorithms are highly computation intensive, and do not scale to extremely large graphs that characterize many emerging applications. In recent years, graph processing frameworks such as Pregel have sought to harness shared nothing clusters for processing massive graphs through a vertex-centric, Bulk Synchronous Parallel (BSP) programming model. However, developing scalable and efficient BSP-based algorithms for pattern matching is very challenging because this problem does not naturally align with a vertex-centric programming paradigm. This paper presents novel distributed algorithms based on the vertex-centric programming paradigm for a set of pattern matching models, namely, graph simulation, dual simulation and strong simulation. Our algorithms are fine-tuned to consider the challenges of pattern matching on massive data graphs. Furthermore, we introduce a new pattern matching model, called strict simulation, which outperforms strong simulation in terms of scalability while preserving its important properties. We investigate potential performance bottlenecks and propose several techniques to mitigate them. This paper also presents an extensive set of experiments involving massive graphs (millions of vertices and billions of edges) to study the effects of various parameters on the scalability and performance of the proposed algorithms. The results demonstrate that our techniques are highly effective in alleviating performance bottlenecks and yield significant scalability benefits.</description><subject>Algorithm design and analysis</subject><subject>Clustering algorithms</subject><subject>Computational modeling</subject><subject>Data models</subject><subject>data-intensive computing</subject><subject>Distributed algorithms</subject><subject>graph simulation</subject><subject>Pattern matching</subject><subject>Programming</subject><subject>query graphs</subject><subject>subgraph isomorphism</subject><isbn>147991293X</isbn><isbn>9781479912933</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2013</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><recordid>eNotj71OwzAURs2ABJQ-AQx-gQRf_8ZjKVCQIrGAxFY5tzepEU0j21Tw9oDo9B2d4UgfY9cgagDhb27jcBdKqKUAVVvrwQo4YRegnfcgvXo7Y_Oc34UQ4JzRTp-zdsE3MZcUu89CG36gVOirQhp_FfIwTWkfcMv7feJTKIXSyHeh4DaOA49_nHM8EB9SmLb5kp324SPT_Lgz9vpw_7J8rNrn1dNy0VYRnCmV7LogodHeoLDaduBAOmN7UrKX2AVlG4VIwavey6ZX0qC0xmiUGpWTqGbs6r8biWg9pbgL6Xt9PKx-ALmpTX0</recordid><startdate>201310</startdate><enddate>201310</enddate><creator>Fard, Arash</creator><creator>Nisar, M. Usman</creator><creator>Ramaswamy, Lakshmish</creator><creator>Miller, John A.</creator><creator>Saltz, Matthew</creator><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope></search><sort><creationdate>201310</creationdate><title>A distributed vertex-centric approach for pattern matching in massive graphs</title><author>Fard, Arash ; Nisar, M. Usman ; Ramaswamy, Lakshmish ; Miller, John A. ; Saltz, Matthew</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i175t-2bba218495c0646b1712756fe32f2cba3683ccea93f928f325c26554c24c372c3</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2013</creationdate><topic>Algorithm design and analysis</topic><topic>Clustering algorithms</topic><topic>Computational modeling</topic><topic>Data models</topic><topic>data-intensive computing</topic><topic>Distributed algorithms</topic><topic>graph simulation</topic><topic>Pattern matching</topic><topic>Programming</topic><topic>query graphs</topic><topic>subgraph isomorphism</topic><toplevel>online_resources</toplevel><creatorcontrib>Fard, Arash</creatorcontrib><creatorcontrib>Nisar, M. Usman</creatorcontrib><creatorcontrib>Ramaswamy, Lakshmish</creatorcontrib><creatorcontrib>Miller, John A.</creatorcontrib><creatorcontrib>Saltz, Matthew</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Xplore</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Fard, Arash</au><au>Nisar, M. Usman</au><au>Ramaswamy, Lakshmish</au><au>Miller, John A.</au><au>Saltz, Matthew</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>A distributed vertex-centric approach for pattern matching in massive graphs</atitle><btitle>2013 IEEE International Conference on Big Data</btitle><stitle>BigData</stitle><date>2013-10</date><risdate>2013</risdate><spage>403</spage><epage>411</epage><pages>403-411</pages><eisbn>147991293X</eisbn><eisbn>9781479912933</eisbn><abstract>Graph pattern matching is fundamentally important to many applications such as analyzing hyper-links in the World Wide Web, mining associations in online social networks, and substructure search in biochemistry. Most existing graph pattern matching algorithms are highly computation intensive, and do not scale to extremely large graphs that characterize many emerging applications. In recent years, graph processing frameworks such as Pregel have sought to harness shared nothing clusters for processing massive graphs through a vertex-centric, Bulk Synchronous Parallel (BSP) programming model. However, developing scalable and efficient BSP-based algorithms for pattern matching is very challenging because this problem does not naturally align with a vertex-centric programming paradigm. This paper presents novel distributed algorithms based on the vertex-centric programming paradigm for a set of pattern matching models, namely, graph simulation, dual simulation and strong simulation. Our algorithms are fine-tuned to consider the challenges of pattern matching on massive data graphs. Furthermore, we introduce a new pattern matching model, called strict simulation, which outperforms strong simulation in terms of scalability while preserving its important properties. We investigate potential performance bottlenecks and propose several techniques to mitigate them. This paper also presents an extensive set of experiments involving massive graphs (millions of vertices and billions of edges) to study the effects of various parameters on the scalability and performance of the proposed algorithms. The results demonstrate that our techniques are highly effective in alleviating performance bottlenecks and yield significant scalability benefits.</abstract><pub>IEEE</pub><doi>10.1109/BigData.2013.6691601</doi><tpages>9</tpages></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | EISBN: 147991293X |
ispartof | 2013 IEEE International Conference on Big Data, 2013, p.403-411 |
issn | |
language | eng |
recordid | cdi_ieee_primary_6691601 |
source | IEEE Electronic Library (IEL) Conference Proceedings |
subjects | Algorithm design and analysis Clustering algorithms Computational modeling Data models data-intensive computing Distributed algorithms graph simulation Pattern matching Programming query graphs subgraph isomorphism |
title | A distributed vertex-centric approach for pattern matching in massive graphs |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-08T04%3A00%3A09IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=A%20distributed%20vertex-centric%20approach%20for%20pattern%20matching%20in%20massive%20graphs&rft.btitle=2013%20IEEE%20International%20Conference%20on%20Big%20Data&rft.au=Fard,%20Arash&rft.date=2013-10&rft.spage=403&rft.epage=411&rft.pages=403-411&rft_id=info:doi/10.1109/BigData.2013.6691601&rft.eisbn=147991293X&rft.eisbn_list=9781479912933&rft_dat=%3Cieee_6IE%3E6691601%3C/ieee_6IE%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-i175t-2bba218495c0646b1712756fe32f2cba3683ccea93f928f325c26554c24c372c3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=6691601&rfr_iscdi=true |