Loading…

Mis-categorized entities detection

Entity categorization, the process of categorizing entities into groups, is an important problem with many applications. However, in practice, many entities are mis-categorized, such as Google Scholar and Amazon products. In this paper, we study the problem of discovering mis-categorized entities fr...

Full description

Saved in:
Bibliographic Details
Published in:The VLDB journal 2021-07, Vol.30 (4), p.515-536
Main Authors: Hao, Shuang, Tang, Nan, Li, Guoliang, Feng, Jianhua, Wang, Ning
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Entity categorization, the process of categorizing entities into groups, is an important problem with many applications. However, in practice, many entities are mis-categorized, such as Google Scholar and Amazon products. In this paper, we study the problem of discovering mis-categorized entities from a given group of categorized entities. This problem is inherently hard: All entities within the same group have been “well” categorized by the state-of-the-art solutions. Apparently, it is nontrivial to differentiate them. We propose a novel rule-based framework to solve this problem. It first uses positive rules to compute disjoint partitions of entities, where the partition with the largest size is taken as the correctly categorized partition, namely the pivot partition . It then uses negative rules to identify mis-categorized entities in other partitions that are dissimilar to the entities in the pivot partition. We describe optimizations on applying these rules and discuss how to generate positive/negative rules. In addition, we propose novel strategies to resolve inconsistent rules. Extensive experimental results on real-world datasets show the effectiveness of our solution.
ISSN:1066-8888
0949-877X
DOI:10.1007/s00778-021-00653-w