Loading…

Authorship attribution of web forum posts

Extracting useful information from user generated text on the web is an important ongoing research in natural language processing, machine learning, and data mining. Online tools like emails, news groups, blogs, and web forums provide an effective communication platform for millions of users around...

Full description

Saved in:
Bibliographic Details
Main Authors: Pillay, S R, Solorio, T
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites
container_end_page 7
container_issue
container_start_page 1
container_title
container_volume
creator Pillay, S R
Solorio, T
description Extracting useful information from user generated text on the web is an important ongoing research in natural language processing, machine learning, and data mining. Online tools like emails, news groups, blogs, and web forums provide an effective communication platform for millions of users around the globe and also provide an added advantage of anonymity. Millions of people post information on different web forums daily. The possibility of exchanging sensitive information between anonymous users on these web forums cannot be ruled out. This document proposes a two stage approach for combining unsupervised and supervised learning approaches for performing authorship attribution on web forum posts. During the first stage, the approach focuses on using clustering techniques to make an effort to group the data sets into stylistically similar clusters. The second stage involves using the resulting clusters from stage one as features to train different machine learning classifiers. This two stage approach is an effort towards reducing the complexity of the classification task and boosting the prediction accuracy.
doi_str_mv 10.1109/ecrime.2010.5706693
format conference_proceeding
fullrecord <record><control><sourceid>ieee_6IE</sourceid><recordid>TN_cdi_ieee_primary_5706693</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>5706693</ieee_id><sourcerecordid>5706693</sourcerecordid><originalsourceid>FETCH-LOGICAL-i90t-879c5d1dcd4ac33f4e0f2a8498dbc3d083d0b1b28412987438e1ec594dcbb92c3</originalsourceid><addsrcrecordid>eNpVj8tqwzAURNUXNKT-gmy87cKprnRl6S5D6AsC3WQfrIeJSlMZS6b072toKHRgGJgDA8PYCvgagNNDcGM8hbXgc6E0b1uSF6wibQAFotYttJdsIUBRAwLV1T_G6fqPSX3Lqpzf-SyFRki1YPebqRzTmI9xqLtSxminEtNnnfr6K9i6T-N0qoeUS75jN333kUN1ziXbPz3uty_N7u35dbvZNZF4aYwmpzx457FzUvYYeC86g2S8ddJzM9uCFQZBkNEoTYDgFKF31pJwcslWv7MxhHAY5uvd-H04_5Y_T5tIBA</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Authorship attribution of web forum posts</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Pillay, S R ; Solorio, T</creator><creatorcontrib>Pillay, S R ; Solorio, T</creatorcontrib><description>Extracting useful information from user generated text on the web is an important ongoing research in natural language processing, machine learning, and data mining. Online tools like emails, news groups, blogs, and web forums provide an effective communication platform for millions of users around the globe and also provide an added advantage of anonymity. Millions of people post information on different web forums daily. The possibility of exchanging sensitive information between anonymous users on these web forums cannot be ruled out. This document proposes a two stage approach for combining unsupervised and supervised learning approaches for performing authorship attribution on web forum posts. During the first stage, the approach focuses on using clustering techniques to make an effort to group the data sets into stylistically similar clusters. The second stage involves using the resulting clusters from stage one as features to train different machine learning classifiers. This two stage approach is an effort towards reducing the complexity of the classification task and boosting the prediction accuracy.</description><identifier>ISSN: 2159-1237</identifier><identifier>ISBN: 9781424477609</identifier><identifier>ISBN: 1424477603</identifier><identifier>EISSN: 2159-1245</identifier><identifier>EISBN: 9781424477616</identifier><identifier>EISBN: 9781424477623</identifier><identifier>EISBN: 142447762X</identifier><identifier>EISBN: 1424477611</identifier><identifier>DOI: 10.1109/ecrime.2010.5706693</identifier><language>eng</language><publisher>IEEE</publisher><subject>Accuracy ; Authorship attribution ; Classification algorithms ; Classification tree analysis ; clustering ; Feature extraction ; Machine learning ; Machine learning algorithms ; machine learning classifiers ; stylometry ; text categorization ; Training</subject><ispartof>2010 eCrime Researchers Summit, 2010, p.1-7</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/5706693$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,776,780,785,786,2052,27902,54530,54895,54907</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/5706693$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Pillay, S R</creatorcontrib><creatorcontrib>Solorio, T</creatorcontrib><title>Authorship attribution of web forum posts</title><title>2010 eCrime Researchers Summit</title><addtitle>ecrime</addtitle><description>Extracting useful information from user generated text on the web is an important ongoing research in natural language processing, machine learning, and data mining. Online tools like emails, news groups, blogs, and web forums provide an effective communication platform for millions of users around the globe and also provide an added advantage of anonymity. Millions of people post information on different web forums daily. The possibility of exchanging sensitive information between anonymous users on these web forums cannot be ruled out. This document proposes a two stage approach for combining unsupervised and supervised learning approaches for performing authorship attribution on web forum posts. During the first stage, the approach focuses on using clustering techniques to make an effort to group the data sets into stylistically similar clusters. The second stage involves using the resulting clusters from stage one as features to train different machine learning classifiers. This two stage approach is an effort towards reducing the complexity of the classification task and boosting the prediction accuracy.</description><subject>Accuracy</subject><subject>Authorship attribution</subject><subject>Classification algorithms</subject><subject>Classification tree analysis</subject><subject>clustering</subject><subject>Feature extraction</subject><subject>Machine learning</subject><subject>Machine learning algorithms</subject><subject>machine learning classifiers</subject><subject>stylometry</subject><subject>text categorization</subject><subject>Training</subject><issn>2159-1237</issn><issn>2159-1245</issn><isbn>9781424477609</isbn><isbn>1424477603</isbn><isbn>9781424477616</isbn><isbn>9781424477623</isbn><isbn>142447762X</isbn><isbn>1424477611</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2010</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><recordid>eNpVj8tqwzAURNUXNKT-gmy87cKprnRl6S5D6AsC3WQfrIeJSlMZS6b072toKHRgGJgDA8PYCvgagNNDcGM8hbXgc6E0b1uSF6wibQAFotYttJdsIUBRAwLV1T_G6fqPSX3Lqpzf-SyFRki1YPebqRzTmI9xqLtSxminEtNnnfr6K9i6T-N0qoeUS75jN333kUN1ziXbPz3uty_N7u35dbvZNZF4aYwmpzx457FzUvYYeC86g2S8ddJzM9uCFQZBkNEoTYDgFKF31pJwcslWv7MxhHAY5uvd-H04_5Y_T5tIBA</recordid><startdate>201010</startdate><enddate>201010</enddate><creator>Pillay, S R</creator><creator>Solorio, T</creator><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope></search><sort><creationdate>201010</creationdate><title>Authorship attribution of web forum posts</title><author>Pillay, S R ; Solorio, T</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i90t-879c5d1dcd4ac33f4e0f2a8498dbc3d083d0b1b28412987438e1ec594dcbb92c3</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2010</creationdate><topic>Accuracy</topic><topic>Authorship attribution</topic><topic>Classification algorithms</topic><topic>Classification tree analysis</topic><topic>clustering</topic><topic>Feature extraction</topic><topic>Machine learning</topic><topic>Machine learning algorithms</topic><topic>machine learning classifiers</topic><topic>stylometry</topic><topic>text categorization</topic><topic>Training</topic><toplevel>online_resources</toplevel><creatorcontrib>Pillay, S R</creatorcontrib><creatorcontrib>Solorio, T</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Explore</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Pillay, S R</au><au>Solorio, T</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Authorship attribution of web forum posts</atitle><btitle>2010 eCrime Researchers Summit</btitle><stitle>ecrime</stitle><date>2010-10</date><risdate>2010</risdate><spage>1</spage><epage>7</epage><pages>1-7</pages><issn>2159-1237</issn><eissn>2159-1245</eissn><isbn>9781424477609</isbn><isbn>1424477603</isbn><eisbn>9781424477616</eisbn><eisbn>9781424477623</eisbn><eisbn>142447762X</eisbn><eisbn>1424477611</eisbn><abstract>Extracting useful information from user generated text on the web is an important ongoing research in natural language processing, machine learning, and data mining. Online tools like emails, news groups, blogs, and web forums provide an effective communication platform for millions of users around the globe and also provide an added advantage of anonymity. Millions of people post information on different web forums daily. The possibility of exchanging sensitive information between anonymous users on these web forums cannot be ruled out. This document proposes a two stage approach for combining unsupervised and supervised learning approaches for performing authorship attribution on web forum posts. During the first stage, the approach focuses on using clustering techniques to make an effort to group the data sets into stylistically similar clusters. The second stage involves using the resulting clusters from stage one as features to train different machine learning classifiers. This two stage approach is an effort towards reducing the complexity of the classification task and boosting the prediction accuracy.</abstract><pub>IEEE</pub><doi>10.1109/ecrime.2010.5706693</doi><tpages>7</tpages></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 2159-1237
ispartof 2010 eCrime Researchers Summit, 2010, p.1-7
issn 2159-1237
2159-1245
language eng
recordid cdi_ieee_primary_5706693
source IEEE Electronic Library (IEL) Conference Proceedings
subjects Accuracy
Authorship attribution
Classification algorithms
Classification tree analysis
clustering
Feature extraction
Machine learning
Machine learning algorithms
machine learning classifiers
stylometry
text categorization
Training
title Authorship attribution of web forum posts
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-06T12%3A08%3A32IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Authorship%20attribution%20of%20web%20forum%20posts&rft.btitle=2010%20eCrime%20Researchers%20Summit&rft.au=Pillay,%20S%20R&rft.date=2010-10&rft.spage=1&rft.epage=7&rft.pages=1-7&rft.issn=2159-1237&rft.eissn=2159-1245&rft.isbn=9781424477609&rft.isbn_list=1424477603&rft_id=info:doi/10.1109/ecrime.2010.5706693&rft.eisbn=9781424477616&rft.eisbn_list=9781424477623&rft.eisbn_list=142447762X&rft.eisbn_list=1424477611&rft_dat=%3Cieee_6IE%3E5706693%3C/ieee_6IE%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-i90t-879c5d1dcd4ac33f4e0f2a8498dbc3d083d0b1b28412987438e1ec594dcbb92c3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=5706693&rfr_iscdi=true