Loading…

SQSE: A Measure to Assess Sample Quality of Authorial Style as a Cognitive Biometric Trait

Stylistic analysis of text is a widely researched topic in both cognitive biometrics and linguistics. Often referred to as Authorship Attribution (AA), the scope of this problem has expanded from a few hundred authors with similar data characteristics to large-scale corpora having several thousand a...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on biometrics, behavior, and identity science behavior, and identity science, 2021-10, Vol.3 (4), p.583-596
Main Authors: Wilson, Ronald, Bhandarkar, Avanti, Lyons, Princess, Woodard, Damon L.
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Stylistic analysis of text is a widely researched topic in both cognitive biometrics and linguistics. Often referred to as Authorship Attribution (AA), the scope of this problem has expanded from a few hundred authors with similar data characteristics to large-scale corpora having several thousand authors and cross-domain samples. Even though the AA algorithms have evolved to keep up with the requirements of the community, the process for choosing an appropriate text sample with good style characteristics has remained poorly defined. This paper, for the first time, formalizes the sample selection process using a style quality evaluation measure for AA, called Sample Quality for Style Extraction (SQSE). Furthermore, we will demonstrate the utility of the measure on multiple large-scale cross-domain corpora with over 6,500 authors and 250,000 text samples. The SQSE measure, supported by over 200 experiments and 4 million comparisons, exhibits a strong positive correlation with matching performance on a wide variety of AA algorithms resulting in a Pearson correlation coefficient of 0.87, and positively identifies samples of good stylometric quality.
ISSN:2637-6407
2637-6407
DOI:10.1109/TBIOM.2021.3120985