Loading…
Authorship attribution of web forum posts
Extracting useful information from user generated text on the web is an important ongoing research in natural language processing, machine learning, and data mining. Online tools like emails, news groups, blogs, and web forums provide an effective communication platform for millions of users around...
Saved in:
Main Authors: | , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Extracting useful information from user generated text on the web is an important ongoing research in natural language processing, machine learning, and data mining. Online tools like emails, news groups, blogs, and web forums provide an effective communication platform for millions of users around the globe and also provide an added advantage of anonymity. Millions of people post information on different web forums daily. The possibility of exchanging sensitive information between anonymous users on these web forums cannot be ruled out. This document proposes a two stage approach for combining unsupervised and supervised learning approaches for performing authorship attribution on web forum posts. During the first stage, the approach focuses on using clustering techniques to make an effort to group the data sets into stylistically similar clusters. The second stage involves using the resulting clusters from stage one as features to train different machine learning classifiers. This two stage approach is an effort towards reducing the complexity of the classification task and boosting the prediction accuracy. |
---|---|
ISSN: | 2159-1237 2159-1245 |
DOI: | 10.1109/ecrime.2010.5706693 |