Loading…
ExactSim: benchmarking single-source SimRank algorithms with high-precision ground truths
SimRank is a popular measurement for evaluating the node-to-node similarities based on the graph topology. In recent years, single-source and top- k SimRank queries have received increasing attention due to their applications in web mining, social network analysis, and spam detection. However, a fun...
Saved in:
Published in: | The VLDB journal 2021-11, Vol.30 (6), p.989-1015 |
---|---|
Main Authors: | , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | SimRank
is a popular measurement for evaluating the node-to-node similarities based on the graph topology. In recent years, single-source and top-
k
SimRank queries have received increasing attention due to their applications in web mining, social network analysis, and spam detection. However, a fundamental obstacle in studying SimRank has been the lack of ground truths. The only exact algorithm, Power Method, is computationally infeasible on graphs with more than
10
6
nodes. Consequently, no existing work has evaluated the actual accuracy of various single-source and top-
k
SimRank algorithms on large real-world graphs. In this paper, we present ExactSim, the first algorithm that computes the exact single-source and top-
k
SimRank results on large graphs. This algorithm produces ground truths with precision up to 7 decimal places with high probability. With the ground truths computed by ExactSim, we present the first experimental study of the accuracy/cost trade-offs of existing approximate SimRank algorithms on large real-world graphs and synthetic graphs. Finally, we use the ground truths to exploit various properties of SimRank distributions on large graphs. |
---|---|
ISSN: | 1066-8888 0949-877X |
DOI: | 10.1007/s00778-021-00672-7 |