Loading…

Efficient Graph Similarity Search in External Memory

Many real-world applications, such as bioinformatics, data mining, pattern recognition, and social network analysis, benefit from efficient solutions for the graph similarity search problem. Existing methods have limited scalability when they handle the large graph databases, for example, those with...

Full description

Saved in:
Bibliographic Details
Published in:IEEE access 2017, Vol.5, p.4551-4560
Main Authors: Xiaoyang Chen, Hongwei Huo, Jun Huan, Vitter, Jeffrey Scott
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c408t-a2b3fb51c697019306ee890ecd09a8ea4e48bd0a45f90d5a89ac6056f6cfb6cf3
cites cdi_FETCH-LOGICAL-c408t-a2b3fb51c697019306ee890ecd09a8ea4e48bd0a45f90d5a89ac6056f6cfb6cf3
container_end_page 4560
container_issue
container_start_page 4551
container_title IEEE access
container_volume 5
creator Xiaoyang Chen
Hongwei Huo
Jun Huan
Vitter, Jeffrey Scott
description Many real-world applications, such as bioinformatics, data mining, pattern recognition, and social network analysis, benefit from efficient solutions for the graph similarity search problem. Existing methods have limited scalability when they handle the large graph databases, for example, those with millions or billions of graphs that cannot fit in main memory. In this paper, we study the problem of graph similarity search under the graph edit distance constraint in external memory. We present an efficient framework for arbitrary q-gram-based representations of a graph. Specifically, we propose a q-gram matrix index stored in hybrid layout in external memory to achieve efficient query processing, by converting the q-gram counting filter into a sparse matrix-vector multiplication problem. Furthermore, we also boost the query performance by transforming the global filter to a 2-D query rectangle, which allows us to perform a query in a reduced region, significantly reducing the number of query I/Os in practice. Extensive experiments on real data sets confirm that 1) our method can compete with the state-of-the-art in-memory methods in index size and filtering ability, and outperform them on scalability of coping with the PubChem data set including 25 million chemical structure graphs and 2) compared with the popular q-gram-based external inverted index, our external index structure needs much fewer number of query I/Os on the PubChem data set.
doi_str_mv 10.1109/ACCESS.2017.2682107
format article
fullrecord <record><control><sourceid>proquest_ieee_</sourceid><recordid>TN_cdi_ieee_primary_7878573</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>7878573</ieee_id><doaj_id>oai_doaj_org_article_78ec6d808ff641bc8dff58742d070dc1</doaj_id><sourcerecordid>2455940754</sourcerecordid><originalsourceid>FETCH-LOGICAL-c408t-a2b3fb51c697019306ee890ecd09a8ea4e48bd0a45f90d5a89ac6056f6cfb6cf3</originalsourceid><addsrcrecordid>eNpNUE1Lw0AQDaJgqf0FvQQ8p84m-3ksIdZCxUP0vGw2s3ZL2tRNCvbfmxopDgwzPN57w7womhNYEALqaZnnRVkuUiBikXKZEhA30SQlXCUZy_jtv_0-mnXdDoaSA8TEJKKFc956PPTxKpjjNi793jcm-P4cl2iC3cb-EBffPYaDaeJX3Lfh_BDdOdN0OPub0-jjuXjPX5LN22qdLzeJpSD7xKRV5ipGLFcCiMqAI0oFaGtQRqKhSGVVg6HMKaiZkcpYDow7bl01dDaN1qNv3ZqdPga_N-GsW-P1L9CGT21C722DWki0vJYgneOUVFbWzjEpaFqDgNqSwetx9DqG9uuEXa937enyU6dTypiiIBgdWNnIsqHtuoDuepWAvqStx7T1JW39l_agmo8qj4hXhZBCMpFlP1yDekM</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2455940754</pqid></control><display><type>article</type><title>Efficient Graph Similarity Search in External Memory</title><source>IEEE Xplore Open Access Journals</source><creator>Xiaoyang Chen ; Hongwei Huo ; Jun Huan ; Vitter, Jeffrey Scott</creator><creatorcontrib>Xiaoyang Chen ; Hongwei Huo ; Jun Huan ; Vitter, Jeffrey Scott</creatorcontrib><description>Many real-world applications, such as bioinformatics, data mining, pattern recognition, and social network analysis, benefit from efficient solutions for the graph similarity search problem. Existing methods have limited scalability when they handle the large graph databases, for example, those with millions or billions of graphs that cannot fit in main memory. In this paper, we study the problem of graph similarity search under the graph edit distance constraint in external memory. We present an efficient framework for arbitrary q-gram-based representations of a graph. Specifically, we propose a q-gram matrix index stored in hybrid layout in external memory to achieve efficient query processing, by converting the q-gram counting filter into a sparse matrix-vector multiplication problem. Furthermore, we also boost the query performance by transforming the global filter to a 2-D query rectangle, which allows us to perform a query in a reduced region, significantly reducing the number of query I/Os in practice. Extensive experiments on real data sets confirm that 1) our method can compete with the state-of-the-art in-memory methods in index size and filtering ability, and outperform them on scalability of coping with the PubChem data set including 25 million chemical structure graphs and 2) compared with the popular q-gram-based external inverted index, our external index structure needs much fewer number of query I/Os on the PubChem data set.</description><identifier>ISSN: 2169-3536</identifier><identifier>EISSN: 2169-3536</identifier><identifier>DOI: 10.1109/ACCESS.2017.2682107</identifier><identifier>CODEN: IAECCG</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Bioinformatics ; Data mining ; Datasets ; external memory ; Graph similarity search ; Graphical representations ; Graphs ; Indexes ; Mathematical analysis ; Matrix algebra ; matrix index ; Matrix methods ; Multiplication ; Network analysis ; Pattern recognition ; Queries ; Query processing ; Search problems ; Searching ; Similarity ; Social networks ; Sparse matrices ; Transforms</subject><ispartof>IEEE access, 2017, Vol.5, p.4551-4560</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2017</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c408t-a2b3fb51c697019306ee890ecd09a8ea4e48bd0a45f90d5a89ac6056f6cfb6cf3</citedby><cites>FETCH-LOGICAL-c408t-a2b3fb51c697019306ee890ecd09a8ea4e48bd0a45f90d5a89ac6056f6cfb6cf3</cites><orcidid>0000-0002-5436-1851</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/7878573$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>314,776,780,4010,27610,27900,27901,27902,54908</link.rule.ids></links><search><creatorcontrib>Xiaoyang Chen</creatorcontrib><creatorcontrib>Hongwei Huo</creatorcontrib><creatorcontrib>Jun Huan</creatorcontrib><creatorcontrib>Vitter, Jeffrey Scott</creatorcontrib><title>Efficient Graph Similarity Search in External Memory</title><title>IEEE access</title><addtitle>Access</addtitle><description>Many real-world applications, such as bioinformatics, data mining, pattern recognition, and social network analysis, benefit from efficient solutions for the graph similarity search problem. Existing methods have limited scalability when they handle the large graph databases, for example, those with millions or billions of graphs that cannot fit in main memory. In this paper, we study the problem of graph similarity search under the graph edit distance constraint in external memory. We present an efficient framework for arbitrary q-gram-based representations of a graph. Specifically, we propose a q-gram matrix index stored in hybrid layout in external memory to achieve efficient query processing, by converting the q-gram counting filter into a sparse matrix-vector multiplication problem. Furthermore, we also boost the query performance by transforming the global filter to a 2-D query rectangle, which allows us to perform a query in a reduced region, significantly reducing the number of query I/Os in practice. Extensive experiments on real data sets confirm that 1) our method can compete with the state-of-the-art in-memory methods in index size and filtering ability, and outperform them on scalability of coping with the PubChem data set including 25 million chemical structure graphs and 2) compared with the popular q-gram-based external inverted index, our external index structure needs much fewer number of query I/Os on the PubChem data set.</description><subject>Bioinformatics</subject><subject>Data mining</subject><subject>Datasets</subject><subject>external memory</subject><subject>Graph similarity search</subject><subject>Graphical representations</subject><subject>Graphs</subject><subject>Indexes</subject><subject>Mathematical analysis</subject><subject>Matrix algebra</subject><subject>matrix index</subject><subject>Matrix methods</subject><subject>Multiplication</subject><subject>Network analysis</subject><subject>Pattern recognition</subject><subject>Queries</subject><subject>Query processing</subject><subject>Search problems</subject><subject>Searching</subject><subject>Similarity</subject><subject>Social networks</subject><subject>Sparse matrices</subject><subject>Transforms</subject><issn>2169-3536</issn><issn>2169-3536</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2017</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>DOA</sourceid><recordid>eNpNUE1Lw0AQDaJgqf0FvQQ8p84m-3ksIdZCxUP0vGw2s3ZL2tRNCvbfmxopDgwzPN57w7womhNYEALqaZnnRVkuUiBikXKZEhA30SQlXCUZy_jtv_0-mnXdDoaSA8TEJKKFc956PPTxKpjjNi793jcm-P4cl2iC3cb-EBffPYaDaeJX3Lfh_BDdOdN0OPub0-jjuXjPX5LN22qdLzeJpSD7xKRV5ipGLFcCiMqAI0oFaGtQRqKhSGVVg6HMKaiZkcpYDow7bl01dDaN1qNv3ZqdPga_N-GsW-P1L9CGT21C722DWki0vJYgneOUVFbWzjEpaFqDgNqSwetx9DqG9uuEXa937enyU6dTypiiIBgdWNnIsqHtuoDuepWAvqStx7T1JW39l_agmo8qj4hXhZBCMpFlP1yDekM</recordid><startdate>2017</startdate><enddate>2017</enddate><creator>Xiaoyang Chen</creator><creator>Hongwei Huo</creator><creator>Jun Huan</creator><creator>Vitter, Jeffrey Scott</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0002-5436-1851</orcidid></search><sort><creationdate>2017</creationdate><title>Efficient Graph Similarity Search in External Memory</title><author>Xiaoyang Chen ; Hongwei Huo ; Jun Huan ; Vitter, Jeffrey Scott</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c408t-a2b3fb51c697019306ee890ecd09a8ea4e48bd0a45f90d5a89ac6056f6cfb6cf3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2017</creationdate><topic>Bioinformatics</topic><topic>Data mining</topic><topic>Datasets</topic><topic>external memory</topic><topic>Graph similarity search</topic><topic>Graphical representations</topic><topic>Graphs</topic><topic>Indexes</topic><topic>Mathematical analysis</topic><topic>Matrix algebra</topic><topic>matrix index</topic><topic>Matrix methods</topic><topic>Multiplication</topic><topic>Network analysis</topic><topic>Pattern recognition</topic><topic>Queries</topic><topic>Query processing</topic><topic>Search problems</topic><topic>Searching</topic><topic>Similarity</topic><topic>Social networks</topic><topic>Sparse matrices</topic><topic>Transforms</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Xiaoyang Chen</creatorcontrib><creatorcontrib>Hongwei Huo</creatorcontrib><creatorcontrib>Jun Huan</creatorcontrib><creatorcontrib>Vitter, Jeffrey Scott</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Xplore Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Explore</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Directory of Open Access Journals</collection><jtitle>IEEE access</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Xiaoyang Chen</au><au>Hongwei Huo</au><au>Jun Huan</au><au>Vitter, Jeffrey Scott</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Efficient Graph Similarity Search in External Memory</atitle><jtitle>IEEE access</jtitle><stitle>Access</stitle><date>2017</date><risdate>2017</risdate><volume>5</volume><spage>4551</spage><epage>4560</epage><pages>4551-4560</pages><issn>2169-3536</issn><eissn>2169-3536</eissn><coden>IAECCG</coden><abstract>Many real-world applications, such as bioinformatics, data mining, pattern recognition, and social network analysis, benefit from efficient solutions for the graph similarity search problem. Existing methods have limited scalability when they handle the large graph databases, for example, those with millions or billions of graphs that cannot fit in main memory. In this paper, we study the problem of graph similarity search under the graph edit distance constraint in external memory. We present an efficient framework for arbitrary q-gram-based representations of a graph. Specifically, we propose a q-gram matrix index stored in hybrid layout in external memory to achieve efficient query processing, by converting the q-gram counting filter into a sparse matrix-vector multiplication problem. Furthermore, we also boost the query performance by transforming the global filter to a 2-D query rectangle, which allows us to perform a query in a reduced region, significantly reducing the number of query I/Os in practice. Extensive experiments on real data sets confirm that 1) our method can compete with the state-of-the-art in-memory methods in index size and filtering ability, and outperform them on scalability of coping with the PubChem data set including 25 million chemical structure graphs and 2) compared with the popular q-gram-based external inverted index, our external index structure needs much fewer number of query I/Os on the PubChem data set.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/ACCESS.2017.2682107</doi><tpages>10</tpages><orcidid>https://orcid.org/0000-0002-5436-1851</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2169-3536
ispartof IEEE access, 2017, Vol.5, p.4551-4560
issn 2169-3536
2169-3536
language eng
recordid cdi_ieee_primary_7878573
source IEEE Xplore Open Access Journals
subjects Bioinformatics
Data mining
Datasets
external memory
Graph similarity search
Graphical representations
Graphs
Indexes
Mathematical analysis
Matrix algebra
matrix index
Matrix methods
Multiplication
Network analysis
Pattern recognition
Queries
Query processing
Search problems
Searching
Similarity
Social networks
Sparse matrices
Transforms
title Efficient Graph Similarity Search in External Memory
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-06T20%3A26%3A22IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_ieee_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Efficient%20Graph%20Similarity%20Search%20in%20External%20Memory&rft.jtitle=IEEE%20access&rft.au=Xiaoyang%20Chen&rft.date=2017&rft.volume=5&rft.spage=4551&rft.epage=4560&rft.pages=4551-4560&rft.issn=2169-3536&rft.eissn=2169-3536&rft.coden=IAECCG&rft_id=info:doi/10.1109/ACCESS.2017.2682107&rft_dat=%3Cproquest_ieee_%3E2455940754%3C/proquest_ieee_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c408t-a2b3fb51c697019306ee890ecd09a8ea4e48bd0a45f90d5a89ac6056f6cfb6cf3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2455940754&rft_id=info:pmid/&rft_ieee_id=7878573&rfr_iscdi=true