Loading…
Mis-categorized entities detection
Entity categorization, the process of categorizing entities into groups, is an important problem with many applications. However, in practice, many entities are mis-categorized, such as Google Scholar and Amazon products. In this paper, we study the problem of discovering mis-categorized entities fr...
Saved in:
Published in: | The VLDB journal 2021-07, Vol.30 (4), p.515-536 |
---|---|
Main Authors: | , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Entity categorization, the process of categorizing entities into groups, is an important problem with many applications. However, in practice, many entities are mis-categorized, such as Google Scholar and Amazon products. In this paper, we study the problem of
discovering mis-categorized entities
from a given group of categorized entities. This problem is inherently hard: All entities within the same group have been “well” categorized by the
state-of-the-art
solutions. Apparently, it is nontrivial to differentiate them. We propose a novel
rule-based framework
to solve this problem. It first uses positive rules to compute disjoint partitions of entities, where the partition with the largest size is taken as the correctly categorized partition, namely the
pivot partition
. It then uses negative rules to identify mis-categorized entities in other partitions that are
dissimilar
to the entities in the pivot partition. We describe optimizations on applying these rules and discuss how to generate positive/negative rules. In addition, we propose novel strategies to resolve inconsistent rules. Extensive experimental results on real-world datasets show the effectiveness of our solution. |
---|---|
ISSN: | 1066-8888 0949-877X |
DOI: | 10.1007/s00778-021-00653-w |