Loading…

Effective and efficient community search over large heterogeneous information networks

Recently, the topic of community search (CS) has gained plenty of attention. Given a query vertex, CS looks for a dense subgraph that contains it. Existing studies mainly focus on homogeneous graphs in which vertices are of the same type, and cannot be directly applied to heterogeneous information n...

Full description

Saved in:
Bibliographic Details
Published in:Proceedings of the VLDB Endowment 2020-02, Vol.13 (6), p.854-867
Main Authors: Fang, Yixiang, Yang, Yixing, Zhang, Wenjie, Lin, Xuemin, Cao, Xin
Format: Article
Language:English
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Recently, the topic of community search (CS) has gained plenty of attention. Given a query vertex, CS looks for a dense subgraph that contains it. Existing studies mainly focus on homogeneous graphs in which vertices are of the same type, and cannot be directly applied to heterogeneous information networks (HINs) that consist of multi-typed, interconnected objects, such as the bibliographic networks and knowledge graphs. In this paper, we study the problem of community search over large HINs; that is, given a query vertex q , find a community from an HIN containing q , in which all the vertices are with the same type of q and have close relationships. To model the relationship between two vertices of the same type, we adopt the well-known concept of meta-path , which is a sequence of relations defined between different types of vertices. We then measure the cohesiveness of the community by extending the classic minimum degree metric with a meta-path. We further propose efficient query algorithms for finding communities using these cohesiveness metrics. We have performed extensive experiments on five real large HINs, and the results show that the proposed solutions are effective for searching communities. Moreover, they are much faster than the baseline solutions.
ISSN:2150-8097
2150-8097
DOI:10.14778/3380750.3380756