Loading…

Graph threshold algorithm

Recently more and more information sources are connected together and become a sort of complex graphs that can be exploited not only as a structured and semi-structured data such as rdb or xml, RDF or NoSQL, but also as many kinds of unstructured data such as web, bioinformatics, genometrics, patent...

Full description

Saved in:
Bibliographic Details
Published in:The Journal of supercomputing 2021-09, Vol.77 (9), p.9827-9847
Main Authors: Lee, Wookey, Song, Justin JongSu, Lee, Charles Cheolgi, Jo, Tae-Chang, Lee, James J. H.
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Recently more and more information sources are connected together and become a sort of complex graphs that can be exploited not only as a structured and semi-structured data such as rdb or xml, RDF or NoSQL, but also as many kinds of unstructured data such as web, bioinformatics, genometrics, patents, social media, knowlege graphs, IoT, hidden graph from deep learning results. State of the art studies have suggested methods of presenting data as hyper-graphs in search queries and finding search results in subgraphs or graph embeddings rather than a list of individual node results. We study the problem of retrieving top- k graph results with the query relevances; that is, given a set of query keywords Q on a graph G , we aim to find a subgraph g of G such that g is highly related to Q and closely linked under g . In order to consider the relevant graph results and the connectivity simultaneously, we present an effective algorithm graph threshold algorithm (GTA) based on a threshold algorithm (TA) which works efficiently in non-graph structure. We show that GTA does not need unnecessary searches under given objective functions, and prove the existence of an upper bound of the size of subgraph for top- k results, called hop max , which makes it efficient to find the combined results. Finally, we conduct the performance studies on real and synthetic graphs, which demonstrate that our algorithm significantly outperforms conventional approaches with respect to time complexity and cost consumption.
ISSN:0920-8542
1573-0484
DOI:10.1007/s11227-021-03665-z