Loading…
Efficient Graph Similarity Search in External Memory
Many real-world applications, such as bioinformatics, data mining, pattern recognition, and social network analysis, benefit from efficient solutions for the graph similarity search problem. Existing methods have limited scalability when they handle the large graph databases, for example, those with...
Saved in:
Published in: | IEEE access 2017, Vol.5, p.4551-4560 |
---|---|
Main Authors: | , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | cdi_FETCH-LOGICAL-c408t-a2b3fb51c697019306ee890ecd09a8ea4e48bd0a45f90d5a89ac6056f6cfb6cf3 |
---|---|
cites | cdi_FETCH-LOGICAL-c408t-a2b3fb51c697019306ee890ecd09a8ea4e48bd0a45f90d5a89ac6056f6cfb6cf3 |
container_end_page | 4560 |
container_issue | |
container_start_page | 4551 |
container_title | IEEE access |
container_volume | 5 |
creator | Xiaoyang Chen Hongwei Huo Jun Huan Vitter, Jeffrey Scott |
description | Many real-world applications, such as bioinformatics, data mining, pattern recognition, and social network analysis, benefit from efficient solutions for the graph similarity search problem. Existing methods have limited scalability when they handle the large graph databases, for example, those with millions or billions of graphs that cannot fit in main memory. In this paper, we study the problem of graph similarity search under the graph edit distance constraint in external memory. We present an efficient framework for arbitrary q-gram-based representations of a graph. Specifically, we propose a q-gram matrix index stored in hybrid layout in external memory to achieve efficient query processing, by converting the q-gram counting filter into a sparse matrix-vector multiplication problem. Furthermore, we also boost the query performance by transforming the global filter to a 2-D query rectangle, which allows us to perform a query in a reduced region, significantly reducing the number of query I/Os in practice. Extensive experiments on real data sets confirm that 1) our method can compete with the state-of-the-art in-memory methods in index size and filtering ability, and outperform them on scalability of coping with the PubChem data set including 25 million chemical structure graphs and 2) compared with the popular q-gram-based external inverted index, our external index structure needs much fewer number of query I/Os on the PubChem data set. |
doi_str_mv | 10.1109/ACCESS.2017.2682107 |
format | article |
fullrecord | <record><control><sourceid>proquest_ieee_</sourceid><recordid>TN_cdi_ieee_primary_7878573</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>7878573</ieee_id><doaj_id>oai_doaj_org_article_78ec6d808ff641bc8dff58742d070dc1</doaj_id><sourcerecordid>2455940754</sourcerecordid><originalsourceid>FETCH-LOGICAL-c408t-a2b3fb51c697019306ee890ecd09a8ea4e48bd0a45f90d5a89ac6056f6cfb6cf3</originalsourceid><addsrcrecordid>eNpNUE1Lw0AQDaJgqf0FvQQ8p84m-3ksIdZCxUP0vGw2s3ZL2tRNCvbfmxopDgwzPN57w7womhNYEALqaZnnRVkuUiBikXKZEhA30SQlXCUZy_jtv_0-mnXdDoaSA8TEJKKFc956PPTxKpjjNi793jcm-P4cl2iC3cb-EBffPYaDaeJX3Lfh_BDdOdN0OPub0-jjuXjPX5LN22qdLzeJpSD7xKRV5ipGLFcCiMqAI0oFaGtQRqKhSGVVg6HMKaiZkcpYDow7bl01dDaN1qNv3ZqdPga_N-GsW-P1L9CGT21C722DWki0vJYgneOUVFbWzjEpaFqDgNqSwetx9DqG9uuEXa937enyU6dTypiiIBgdWNnIsqHtuoDuepWAvqStx7T1JW39l_agmo8qj4hXhZBCMpFlP1yDekM</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2455940754</pqid></control><display><type>article</type><title>Efficient Graph Similarity Search in External Memory</title><source>IEEE Xplore Open Access Journals</source><creator>Xiaoyang Chen ; Hongwei Huo ; Jun Huan ; Vitter, Jeffrey Scott</creator><creatorcontrib>Xiaoyang Chen ; Hongwei Huo ; Jun Huan ; Vitter, Jeffrey Scott</creatorcontrib><description>Many real-world applications, such as bioinformatics, data mining, pattern recognition, and social network analysis, benefit from efficient solutions for the graph similarity search problem. Existing methods have limited scalability when they handle the large graph databases, for example, those with millions or billions of graphs that cannot fit in main memory. In this paper, we study the problem of graph similarity search under the graph edit distance constraint in external memory. We present an efficient framework for arbitrary q-gram-based representations of a graph. Specifically, we propose a q-gram matrix index stored in hybrid layout in external memory to achieve efficient query processing, by converting the q-gram counting filter into a sparse matrix-vector multiplication problem. Furthermore, we also boost the query performance by transforming the global filter to a 2-D query rectangle, which allows us to perform a query in a reduced region, significantly reducing the number of query I/Os in practice. Extensive experiments on real data sets confirm that 1) our method can compete with the state-of-the-art in-memory methods in index size and filtering ability, and outperform them on scalability of coping with the PubChem data set including 25 million chemical structure graphs and 2) compared with the popular q-gram-based external inverted index, our external index structure needs much fewer number of query I/Os on the PubChem data set.</description><identifier>ISSN: 2169-3536</identifier><identifier>EISSN: 2169-3536</identifier><identifier>DOI: 10.1109/ACCESS.2017.2682107</identifier><identifier>CODEN: IAECCG</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Bioinformatics ; Data mining ; Datasets ; external memory ; Graph similarity search ; Graphical representations ; Graphs ; Indexes ; Mathematical analysis ; Matrix algebra ; matrix index ; Matrix methods ; Multiplication ; Network analysis ; Pattern recognition ; Queries ; Query processing ; Search problems ; Searching ; Similarity ; Social networks ; Sparse matrices ; Transforms</subject><ispartof>IEEE access, 2017, Vol.5, p.4551-4560</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2017</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c408t-a2b3fb51c697019306ee890ecd09a8ea4e48bd0a45f90d5a89ac6056f6cfb6cf3</citedby><cites>FETCH-LOGICAL-c408t-a2b3fb51c697019306ee890ecd09a8ea4e48bd0a45f90d5a89ac6056f6cfb6cf3</cites><orcidid>0000-0002-5436-1851</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/7878573$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>314,776,780,4010,27610,27900,27901,27902,54908</link.rule.ids></links><search><creatorcontrib>Xiaoyang Chen</creatorcontrib><creatorcontrib>Hongwei Huo</creatorcontrib><creatorcontrib>Jun Huan</creatorcontrib><creatorcontrib>Vitter, Jeffrey Scott</creatorcontrib><title>Efficient Graph Similarity Search in External Memory</title><title>IEEE access</title><addtitle>Access</addtitle><description>Many real-world applications, such as bioinformatics, data mining, pattern recognition, and social network analysis, benefit from efficient solutions for the graph similarity search problem. Existing methods have limited scalability when they handle the large graph databases, for example, those with millions or billions of graphs that cannot fit in main memory. In this paper, we study the problem of graph similarity search under the graph edit distance constraint in external memory. We present an efficient framework for arbitrary q-gram-based representations of a graph. Specifically, we propose a q-gram matrix index stored in hybrid layout in external memory to achieve efficient query processing, by converting the q-gram counting filter into a sparse matrix-vector multiplication problem. Furthermore, we also boost the query performance by transforming the global filter to a 2-D query rectangle, which allows us to perform a query in a reduced region, significantly reducing the number of query I/Os in practice. Extensive experiments on real data sets confirm that 1) our method can compete with the state-of-the-art in-memory methods in index size and filtering ability, and outperform them on scalability of coping with the PubChem data set including 25 million chemical structure graphs and 2) compared with the popular q-gram-based external inverted index, our external index structure needs much fewer number of query I/Os on the PubChem data set.</description><subject>Bioinformatics</subject><subject>Data mining</subject><subject>Datasets</subject><subject>external memory</subject><subject>Graph similarity search</subject><subject>Graphical representations</subject><subject>Graphs</subject><subject>Indexes</subject><subject>Mathematical analysis</subject><subject>Matrix algebra</subject><subject>matrix index</subject><subject>Matrix methods</subject><subject>Multiplication</subject><subject>Network analysis</subject><subject>Pattern recognition</subject><subject>Queries</subject><subject>Query processing</subject><subject>Search problems</subject><subject>Searching</subject><subject>Similarity</subject><subject>Social networks</subject><subject>Sparse matrices</subject><subject>Transforms</subject><issn>2169-3536</issn><issn>2169-3536</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2017</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>DOA</sourceid><recordid>eNpNUE1Lw0AQDaJgqf0FvQQ8p84m-3ksIdZCxUP0vGw2s3ZL2tRNCvbfmxopDgwzPN57w7womhNYEALqaZnnRVkuUiBikXKZEhA30SQlXCUZy_jtv_0-mnXdDoaSA8TEJKKFc956PPTxKpjjNi793jcm-P4cl2iC3cb-EBffPYaDaeJX3Lfh_BDdOdN0OPub0-jjuXjPX5LN22qdLzeJpSD7xKRV5ipGLFcCiMqAI0oFaGtQRqKhSGVVg6HMKaiZkcpYDow7bl01dDaN1qNv3ZqdPga_N-GsW-P1L9CGT21C722DWki0vJYgneOUVFbWzjEpaFqDgNqSwetx9DqG9uuEXa937enyU6dTypiiIBgdWNnIsqHtuoDuepWAvqStx7T1JW39l_agmo8qj4hXhZBCMpFlP1yDekM</recordid><startdate>2017</startdate><enddate>2017</enddate><creator>Xiaoyang Chen</creator><creator>Hongwei Huo</creator><creator>Jun Huan</creator><creator>Vitter, Jeffrey Scott</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0002-5436-1851</orcidid></search><sort><creationdate>2017</creationdate><title>Efficient Graph Similarity Search in External Memory</title><author>Xiaoyang Chen ; Hongwei Huo ; Jun Huan ; Vitter, Jeffrey Scott</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c408t-a2b3fb51c697019306ee890ecd09a8ea4e48bd0a45f90d5a89ac6056f6cfb6cf3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2017</creationdate><topic>Bioinformatics</topic><topic>Data mining</topic><topic>Datasets</topic><topic>external memory</topic><topic>Graph similarity search</topic><topic>Graphical representations</topic><topic>Graphs</topic><topic>Indexes</topic><topic>Mathematical analysis</topic><topic>Matrix algebra</topic><topic>matrix index</topic><topic>Matrix methods</topic><topic>Multiplication</topic><topic>Network analysis</topic><topic>Pattern recognition</topic><topic>Queries</topic><topic>Query processing</topic><topic>Search problems</topic><topic>Searching</topic><topic>Similarity</topic><topic>Social networks</topic><topic>Sparse matrices</topic><topic>Transforms</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Xiaoyang Chen</creatorcontrib><creatorcontrib>Hongwei Huo</creatorcontrib><creatorcontrib>Jun Huan</creatorcontrib><creatorcontrib>Vitter, Jeffrey Scott</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Xplore Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Explore</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Directory of Open Access Journals</collection><jtitle>IEEE access</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Xiaoyang Chen</au><au>Hongwei Huo</au><au>Jun Huan</au><au>Vitter, Jeffrey Scott</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Efficient Graph Similarity Search in External Memory</atitle><jtitle>IEEE access</jtitle><stitle>Access</stitle><date>2017</date><risdate>2017</risdate><volume>5</volume><spage>4551</spage><epage>4560</epage><pages>4551-4560</pages><issn>2169-3536</issn><eissn>2169-3536</eissn><coden>IAECCG</coden><abstract>Many real-world applications, such as bioinformatics, data mining, pattern recognition, and social network analysis, benefit from efficient solutions for the graph similarity search problem. Existing methods have limited scalability when they handle the large graph databases, for example, those with millions or billions of graphs that cannot fit in main memory. In this paper, we study the problem of graph similarity search under the graph edit distance constraint in external memory. We present an efficient framework for arbitrary q-gram-based representations of a graph. Specifically, we propose a q-gram matrix index stored in hybrid layout in external memory to achieve efficient query processing, by converting the q-gram counting filter into a sparse matrix-vector multiplication problem. Furthermore, we also boost the query performance by transforming the global filter to a 2-D query rectangle, which allows us to perform a query in a reduced region, significantly reducing the number of query I/Os in practice. Extensive experiments on real data sets confirm that 1) our method can compete with the state-of-the-art in-memory methods in index size and filtering ability, and outperform them on scalability of coping with the PubChem data set including 25 million chemical structure graphs and 2) compared with the popular q-gram-based external inverted index, our external index structure needs much fewer number of query I/Os on the PubChem data set.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/ACCESS.2017.2682107</doi><tpages>10</tpages><orcidid>https://orcid.org/0000-0002-5436-1851</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 2169-3536 |
ispartof | IEEE access, 2017, Vol.5, p.4551-4560 |
issn | 2169-3536 2169-3536 |
language | eng |
recordid | cdi_ieee_primary_7878573 |
source | IEEE Xplore Open Access Journals |
subjects | Bioinformatics Data mining Datasets external memory Graph similarity search Graphical representations Graphs Indexes Mathematical analysis Matrix algebra matrix index Matrix methods Multiplication Network analysis Pattern recognition Queries Query processing Search problems Searching Similarity Social networks Sparse matrices Transforms |
title | Efficient Graph Similarity Search in External Memory |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-06T20%3A26%3A22IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_ieee_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Efficient%20Graph%20Similarity%20Search%20in%20External%20Memory&rft.jtitle=IEEE%20access&rft.au=Xiaoyang%20Chen&rft.date=2017&rft.volume=5&rft.spage=4551&rft.epage=4560&rft.pages=4551-4560&rft.issn=2169-3536&rft.eissn=2169-3536&rft.coden=IAECCG&rft_id=info:doi/10.1109/ACCESS.2017.2682107&rft_dat=%3Cproquest_ieee_%3E2455940754%3C/proquest_ieee_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c408t-a2b3fb51c697019306ee890ecd09a8ea4e48bd0a45f90d5a89ac6056f6cfb6cf3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2455940754&rft_id=info:pmid/&rft_ieee_id=7878573&rfr_iscdi=true |