Loading…
Domain-based Latent Personal Analysis and its use for impersonation detection in social media
Zipf’s law defines an inverse proportion between a word’s ranking in a given corpus and its frequency in it, roughly dividing the vocabulary into frequent words and infrequent ones. Here, we stipulate that within a domain an author’s signature can be derived from, in loose terms, the author’s missin...
Saved in:
Published in: | User modeling and user-adapted interaction 2021-09, Vol.31 (4), p.785-828 |
---|---|
Main Authors: | , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Zipf’s law defines an inverse proportion between a word’s ranking in a given corpus and its frequency in it, roughly dividing the vocabulary into frequent words and infrequent ones. Here, we stipulate that
within a domain
an author’s signature can be derived from, in loose terms, the author’s missing popular words and frequently used infrequent words. We devise a method, termed Latent Personal Analysis (LPA), for finding domain-based attributes for entities in a domain: their distance from the domain and their signature, which determines how they most differ from a domain. We identify the most suitable distance metric for the method among several and construct the distances and personal signatures for authors, the domain’s entities. The signature consists of both over-used terms (compared to the average) and
missing
popular terms. We validate the correctness and power of the signatures in identifying users and set existence conditions. We test LPA in several domains, both textual and non-textual. We then demonstrate the use of the method in explainable authorship attribution: we define algorithms that utilize LPA to identify two types of impersonation in social media: (1) authors with sockpuppets (multiple) accounts and (2)
front-users
accounts, operated by several authors. We validate the algorithms and employ them over a large-scale dataset obtained from a social media site with over 4000 users. We corroborate these results using temporal rate analysis. LPA can further be used to devise personal attributes in a wide range of scientific domains in which the constituents have a long-tail distribution of elements. |
---|---|
ISSN: | 0924-1868 1573-1391 |
DOI: | 10.1007/s11257-021-09295-7 |