Loading…

Semi-random subspace method for writeprint identification

The anonymous nature of online messages distribution causes a series of moral and legal issues. By analyzing identity cues people leave behind their texts, i.e., writeprint, potential authors can be identified individually. But writeprint identification is a difficult learning task, because of the h...

Full description

Saved in:
Bibliographic Details
Published in:Neurocomputing (Amsterdam) 2013-05, Vol.108, p.93-102
Main Authors: Liu, Zhi, Yang, Zongkai, Liu, Sanya, Shi, Yinghui
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The anonymous nature of online messages distribution causes a series of moral and legal issues. By analyzing identity cues people leave behind their texts, i.e., writeprint, potential authors can be identified individually. But writeprint identification is a difficult learning task, because of the high redundancy in stylistic feature set and high similarity of some authors’ writing-style. In this paper, we propose a novel method, called semi-random subspace (Semi-RS), to simultaneously address the two problems. Different from the conventional random subspace method (RSM) which samples features from the whole feature set in a completely random way, the proposed Semi-RS randomly samples features on each individual-author feature set (IAFS) partitioned from the whole feature set. More specifically, we first divide the whole feature set into several IAFSs in a deterministic way, then construct a set of base classifiers on different randomly sampled feature sets from each IAFS, and finally combine all base classifiers for the final decision. Experimental results on the benchmark dataset demonstrate the effectiveness of the proposed method which improves previously reported results. In addition, we analyze the diversity of algorithm, reveals that Semi-RS constructs more diverse base classifiers than conventional RSMs. ► The first study to combine the mechanism of individual-author feature set with random subspace method in writeprint identification. ► The study developed a novel method to calculate the individual writeprint for each author and mining unique stylistic features for each author. ► The novel semi-random subspace method greatly improved the diversity among different base classifiers as well as the effectiveness of the ensemble. ► The novel semi-random subspace sufficiently makes uses of the distribution of local discriminative information of each author. ► The proposed method improved the previous reported accuracy on benchmark dataset.
ISSN:0925-2312
1872-8286
DOI:10.1016/j.neucom.2012.11.015