Loading…

A distributed vertex-centric approach for pattern matching in massive graphs

Graph pattern matching is fundamentally important to many applications such as analyzing hyper-links in the World Wide Web, mining associations in online social networks, and substructure search in biochemistry. Most existing graph pattern matching algorithms are highly computation intensive, and do...

Full description

Saved in:
Bibliographic Details
Main Authors: Fard, Arash, Nisar, M. Usman, Ramaswamy, Lakshmish, Miller, John A., Saltz, Matthew
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites
container_end_page 411
container_issue
container_start_page 403
container_title
container_volume
creator Fard, Arash
Nisar, M. Usman
Ramaswamy, Lakshmish
Miller, John A.
Saltz, Matthew
description Graph pattern matching is fundamentally important to many applications such as analyzing hyper-links in the World Wide Web, mining associations in online social networks, and substructure search in biochemistry. Most existing graph pattern matching algorithms are highly computation intensive, and do not scale to extremely large graphs that characterize many emerging applications. In recent years, graph processing frameworks such as Pregel have sought to harness shared nothing clusters for processing massive graphs through a vertex-centric, Bulk Synchronous Parallel (BSP) programming model. However, developing scalable and efficient BSP-based algorithms for pattern matching is very challenging because this problem does not naturally align with a vertex-centric programming paradigm. This paper presents novel distributed algorithms based on the vertex-centric programming paradigm for a set of pattern matching models, namely, graph simulation, dual simulation and strong simulation. Our algorithms are fine-tuned to consider the challenges of pattern matching on massive data graphs. Furthermore, we introduce a new pattern matching model, called strict simulation, which outperforms strong simulation in terms of scalability while preserving its important properties. We investigate potential performance bottlenecks and propose several techniques to mitigate them. This paper also presents an extensive set of experiments involving massive graphs (millions of vertices and billions of edges) to study the effects of various parameters on the scalability and performance of the proposed algorithms. The results demonstrate that our techniques are highly effective in alleviating performance bottlenecks and yield significant scalability benefits.
doi_str_mv 10.1109/BigData.2013.6691601
format conference_proceeding
fullrecord <record><control><sourceid>ieee_6IE</sourceid><recordid>TN_cdi_ieee_primary_6691601</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>6691601</ieee_id><sourcerecordid>6691601</sourcerecordid><originalsourceid>FETCH-LOGICAL-i175t-2bba218495c0646b1712756fe32f2cba3683ccea93f928f325c26554c24c372c3</originalsourceid><addsrcrecordid>eNotj71OwzAURs2ABJQ-AQx-gQRf_8ZjKVCQIrGAxFY5tzepEU0j21Tw9oDo9B2d4UgfY9cgagDhb27jcBdKqKUAVVvrwQo4YRegnfcgvXo7Y_Oc34UQ4JzRTp-zdsE3MZcUu89CG36gVOirQhp_FfIwTWkfcMv7feJTKIXSyHeh4DaOA49_nHM8EB9SmLb5kp324SPT_Lgz9vpw_7J8rNrn1dNy0VYRnCmV7LogodHeoLDaduBAOmN7UrKX2AVlG4VIwavey6ZX0qC0xmiUGpWTqGbs6r8biWg9pbgL6Xt9PKx-ALmpTX0</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>A distributed vertex-centric approach for pattern matching in massive graphs</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Fard, Arash ; Nisar, M. Usman ; Ramaswamy, Lakshmish ; Miller, John A. ; Saltz, Matthew</creator><creatorcontrib>Fard, Arash ; Nisar, M. Usman ; Ramaswamy, Lakshmish ; Miller, John A. ; Saltz, Matthew</creatorcontrib><description>Graph pattern matching is fundamentally important to many applications such as analyzing hyper-links in the World Wide Web, mining associations in online social networks, and substructure search in biochemistry. Most existing graph pattern matching algorithms are highly computation intensive, and do not scale to extremely large graphs that characterize many emerging applications. In recent years, graph processing frameworks such as Pregel have sought to harness shared nothing clusters for processing massive graphs through a vertex-centric, Bulk Synchronous Parallel (BSP) programming model. However, developing scalable and efficient BSP-based algorithms for pattern matching is very challenging because this problem does not naturally align with a vertex-centric programming paradigm. This paper presents novel distributed algorithms based on the vertex-centric programming paradigm for a set of pattern matching models, namely, graph simulation, dual simulation and strong simulation. Our algorithms are fine-tuned to consider the challenges of pattern matching on massive data graphs. Furthermore, we introduce a new pattern matching model, called strict simulation, which outperforms strong simulation in terms of scalability while preserving its important properties. We investigate potential performance bottlenecks and propose several techniques to mitigate them. This paper also presents an extensive set of experiments involving massive graphs (millions of vertices and billions of edges) to study the effects of various parameters on the scalability and performance of the proposed algorithms. The results demonstrate that our techniques are highly effective in alleviating performance bottlenecks and yield significant scalability benefits.</description><identifier>EISBN: 147991293X</identifier><identifier>EISBN: 9781479912933</identifier><identifier>DOI: 10.1109/BigData.2013.6691601</identifier><language>eng</language><publisher>IEEE</publisher><subject>Algorithm design and analysis ; Clustering algorithms ; Computational modeling ; Data models ; data-intensive computing ; Distributed algorithms ; graph simulation ; Pattern matching ; Programming ; query graphs ; subgraph isomorphism</subject><ispartof>2013 IEEE International Conference on Big Data, 2013, p.403-411</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/6691601$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,2058,27925,54920</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/6691601$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Fard, Arash</creatorcontrib><creatorcontrib>Nisar, M. Usman</creatorcontrib><creatorcontrib>Ramaswamy, Lakshmish</creatorcontrib><creatorcontrib>Miller, John A.</creatorcontrib><creatorcontrib>Saltz, Matthew</creatorcontrib><title>A distributed vertex-centric approach for pattern matching in massive graphs</title><title>2013 IEEE International Conference on Big Data</title><addtitle>BigData</addtitle><description>Graph pattern matching is fundamentally important to many applications such as analyzing hyper-links in the World Wide Web, mining associations in online social networks, and substructure search in biochemistry. Most existing graph pattern matching algorithms are highly computation intensive, and do not scale to extremely large graphs that characterize many emerging applications. In recent years, graph processing frameworks such as Pregel have sought to harness shared nothing clusters for processing massive graphs through a vertex-centric, Bulk Synchronous Parallel (BSP) programming model. However, developing scalable and efficient BSP-based algorithms for pattern matching is very challenging because this problem does not naturally align with a vertex-centric programming paradigm. This paper presents novel distributed algorithms based on the vertex-centric programming paradigm for a set of pattern matching models, namely, graph simulation, dual simulation and strong simulation. Our algorithms are fine-tuned to consider the challenges of pattern matching on massive data graphs. Furthermore, we introduce a new pattern matching model, called strict simulation, which outperforms strong simulation in terms of scalability while preserving its important properties. We investigate potential performance bottlenecks and propose several techniques to mitigate them. This paper also presents an extensive set of experiments involving massive graphs (millions of vertices and billions of edges) to study the effects of various parameters on the scalability and performance of the proposed algorithms. The results demonstrate that our techniques are highly effective in alleviating performance bottlenecks and yield significant scalability benefits.</description><subject>Algorithm design and analysis</subject><subject>Clustering algorithms</subject><subject>Computational modeling</subject><subject>Data models</subject><subject>data-intensive computing</subject><subject>Distributed algorithms</subject><subject>graph simulation</subject><subject>Pattern matching</subject><subject>Programming</subject><subject>query graphs</subject><subject>subgraph isomorphism</subject><isbn>147991293X</isbn><isbn>9781479912933</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2013</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><recordid>eNotj71OwzAURs2ABJQ-AQx-gQRf_8ZjKVCQIrGAxFY5tzepEU0j21Tw9oDo9B2d4UgfY9cgagDhb27jcBdKqKUAVVvrwQo4YRegnfcgvXo7Y_Oc34UQ4JzRTp-zdsE3MZcUu89CG36gVOirQhp_FfIwTWkfcMv7feJTKIXSyHeh4DaOA49_nHM8EB9SmLb5kp324SPT_Lgz9vpw_7J8rNrn1dNy0VYRnCmV7LogodHeoLDaduBAOmN7UrKX2AVlG4VIwavey6ZX0qC0xmiUGpWTqGbs6r8biWg9pbgL6Xt9PKx-ALmpTX0</recordid><startdate>201310</startdate><enddate>201310</enddate><creator>Fard, Arash</creator><creator>Nisar, M. Usman</creator><creator>Ramaswamy, Lakshmish</creator><creator>Miller, John A.</creator><creator>Saltz, Matthew</creator><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope></search><sort><creationdate>201310</creationdate><title>A distributed vertex-centric approach for pattern matching in massive graphs</title><author>Fard, Arash ; Nisar, M. Usman ; Ramaswamy, Lakshmish ; Miller, John A. ; Saltz, Matthew</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i175t-2bba218495c0646b1712756fe32f2cba3683ccea93f928f325c26554c24c372c3</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2013</creationdate><topic>Algorithm design and analysis</topic><topic>Clustering algorithms</topic><topic>Computational modeling</topic><topic>Data models</topic><topic>data-intensive computing</topic><topic>Distributed algorithms</topic><topic>graph simulation</topic><topic>Pattern matching</topic><topic>Programming</topic><topic>query graphs</topic><topic>subgraph isomorphism</topic><toplevel>online_resources</toplevel><creatorcontrib>Fard, Arash</creatorcontrib><creatorcontrib>Nisar, M. Usman</creatorcontrib><creatorcontrib>Ramaswamy, Lakshmish</creatorcontrib><creatorcontrib>Miller, John A.</creatorcontrib><creatorcontrib>Saltz, Matthew</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Xplore</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Fard, Arash</au><au>Nisar, M. Usman</au><au>Ramaswamy, Lakshmish</au><au>Miller, John A.</au><au>Saltz, Matthew</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>A distributed vertex-centric approach for pattern matching in massive graphs</atitle><btitle>2013 IEEE International Conference on Big Data</btitle><stitle>BigData</stitle><date>2013-10</date><risdate>2013</risdate><spage>403</spage><epage>411</epage><pages>403-411</pages><eisbn>147991293X</eisbn><eisbn>9781479912933</eisbn><abstract>Graph pattern matching is fundamentally important to many applications such as analyzing hyper-links in the World Wide Web, mining associations in online social networks, and substructure search in biochemistry. Most existing graph pattern matching algorithms are highly computation intensive, and do not scale to extremely large graphs that characterize many emerging applications. In recent years, graph processing frameworks such as Pregel have sought to harness shared nothing clusters for processing massive graphs through a vertex-centric, Bulk Synchronous Parallel (BSP) programming model. However, developing scalable and efficient BSP-based algorithms for pattern matching is very challenging because this problem does not naturally align with a vertex-centric programming paradigm. This paper presents novel distributed algorithms based on the vertex-centric programming paradigm for a set of pattern matching models, namely, graph simulation, dual simulation and strong simulation. Our algorithms are fine-tuned to consider the challenges of pattern matching on massive data graphs. Furthermore, we introduce a new pattern matching model, called strict simulation, which outperforms strong simulation in terms of scalability while preserving its important properties. We investigate potential performance bottlenecks and propose several techniques to mitigate them. This paper also presents an extensive set of experiments involving massive graphs (millions of vertices and billions of edges) to study the effects of various parameters on the scalability and performance of the proposed algorithms. The results demonstrate that our techniques are highly effective in alleviating performance bottlenecks and yield significant scalability benefits.</abstract><pub>IEEE</pub><doi>10.1109/BigData.2013.6691601</doi><tpages>9</tpages></addata></record>
fulltext fulltext_linktorsrc
identifier EISBN: 147991293X
ispartof 2013 IEEE International Conference on Big Data, 2013, p.403-411
issn
language eng
recordid cdi_ieee_primary_6691601
source IEEE Electronic Library (IEL) Conference Proceedings
subjects Algorithm design and analysis
Clustering algorithms
Computational modeling
Data models
data-intensive computing
Distributed algorithms
graph simulation
Pattern matching
Programming
query graphs
subgraph isomorphism
title A distributed vertex-centric approach for pattern matching in massive graphs
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-08T04%3A00%3A09IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=A%20distributed%20vertex-centric%20approach%20for%20pattern%20matching%20in%20massive%20graphs&rft.btitle=2013%20IEEE%20International%20Conference%20on%20Big%20Data&rft.au=Fard,%20Arash&rft.date=2013-10&rft.spage=403&rft.epage=411&rft.pages=403-411&rft_id=info:doi/10.1109/BigData.2013.6691601&rft.eisbn=147991293X&rft.eisbn_list=9781479912933&rft_dat=%3Cieee_6IE%3E6691601%3C/ieee_6IE%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-i175t-2bba218495c0646b1712756fe32f2cba3683ccea93f928f325c26554c24c372c3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=6691601&rfr_iscdi=true