Loading…

Semi-random subspace method for writeprint identification

The anonymous nature of online messages distribution causes a series of moral and legal issues. By analyzing identity cues people leave behind their texts, i.e., writeprint, potential authors can be identified individually. But writeprint identification is a difficult learning task, because of the h...

Full description

Saved in:

Bibliographic Details
Published in:	Neurocomputing (Amsterdam) 2013-05, Vol.108, p.93-102
Main Authors:	Liu, Zhi, Yang, Zongkai, Liu, Sanya, Shi, Yinghui
Format:	Article
Language:	English
Subjects:	Algorithms Applied sciences Artificial intelligence Classifiers Computer science control theory systems Data processing. List processing. Character string processing Diversity Exact sciences and technology Individual-author feature set (IAFS) Leaves Memory and file management (including protection and security) Memory organisation. Data processing Pattern recognition. Digital image processing. Computational geometry Principal component analysis (PCA) Random subspace method (RSM) Redundancy Similarity Software Subspace methods Subspaces Texts Writeprint
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	The anonymous nature of online messages distribution causes a series of moral and legal issues. By analyzing identity cues people leave behind their texts, i.e., writeprint, potential authors can be identified individually. But writeprint identification is a difficult learning task, because of the high redundancy in stylistic feature set and high similarity of some authors’ writing-style. In this paper, we propose a novel method, called semi-random subspace (Semi-RS), to simultaneously address the two problems. Different from the conventional random subspace method (RSM) which samples features from the whole feature set in a completely random way, the proposed Semi-RS randomly samples features on each individual-author feature set (IAFS) partitioned from the whole feature set. More specifically, we first divide the whole feature set into several IAFSs in a deterministic way, then construct a set of base classifiers on different randomly sampled feature sets from each IAFS, and finally combine all base classifiers for the final decision. Experimental results on the benchmark dataset demonstrate the effectiveness of the proposed method which improves previously reported results. In addition, we analyze the diversity of algorithm, reveals that Semi-RS constructs more diverse base classifiers than conventional RSMs. ► The first study to combine the mechanism of individual-author feature set with random subspace method in writeprint identification. ► The study developed a novel method to calculate the individual writeprint for each author and mining unique stylistic features for each author. ► The novel semi-random subspace method greatly improved the diversity among different base classifiers as well as the effectiveness of the ensemble. ► The novel semi-random subspace sufficiently makes uses of the distribution of local discriminative information of each author. ► The proposed method improved the previous reported accuracy on benchmark dataset.
ISSN:	0925-2312 1872-8286
DOI:	10.1016/j.neucom.2012.11.015