Loading…
Authorship attribution of web forum posts
Extracting useful information from user generated text on the web is an important ongoing research in natural language processing, machine learning, and data mining. Online tools like emails, news groups, blogs, and web forums provide an effective communication platform for millions of users around...
Saved in:
Main Authors: | , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | |
---|---|
cites | |
container_end_page | 7 |
container_issue | |
container_start_page | 1 |
container_title | |
container_volume | |
creator | Pillay, S R Solorio, T |
description | Extracting useful information from user generated text on the web is an important ongoing research in natural language processing, machine learning, and data mining. Online tools like emails, news groups, blogs, and web forums provide an effective communication platform for millions of users around the globe and also provide an added advantage of anonymity. Millions of people post information on different web forums daily. The possibility of exchanging sensitive information between anonymous users on these web forums cannot be ruled out. This document proposes a two stage approach for combining unsupervised and supervised learning approaches for performing authorship attribution on web forum posts. During the first stage, the approach focuses on using clustering techniques to make an effort to group the data sets into stylistically similar clusters. The second stage involves using the resulting clusters from stage one as features to train different machine learning classifiers. This two stage approach is an effort towards reducing the complexity of the classification task and boosting the prediction accuracy. |
doi_str_mv | 10.1109/ecrime.2010.5706693 |
format | conference_proceeding |
fullrecord | <record><control><sourceid>ieee_6IE</sourceid><recordid>TN_cdi_ieee_primary_5706693</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>5706693</ieee_id><sourcerecordid>5706693</sourcerecordid><originalsourceid>FETCH-LOGICAL-i90t-879c5d1dcd4ac33f4e0f2a8498dbc3d083d0b1b28412987438e1ec594dcbb92c3</originalsourceid><addsrcrecordid>eNpVj8tqwzAURNUXNKT-gmy87cKprnRl6S5D6AsC3WQfrIeJSlMZS6b072toKHRgGJgDA8PYCvgagNNDcGM8hbXgc6E0b1uSF6wibQAFotYttJdsIUBRAwLV1T_G6fqPSX3Lqpzf-SyFRki1YPebqRzTmI9xqLtSxminEtNnnfr6K9i6T-N0qoeUS75jN333kUN1ziXbPz3uty_N7u35dbvZNZF4aYwmpzx457FzUvYYeC86g2S8ddJzM9uCFQZBkNEoTYDgFKF31pJwcslWv7MxhHAY5uvd-H04_5Y_T5tIBA</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Authorship attribution of web forum posts</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Pillay, S R ; Solorio, T</creator><creatorcontrib>Pillay, S R ; Solorio, T</creatorcontrib><description>Extracting useful information from user generated text on the web is an important ongoing research in natural language processing, machine learning, and data mining. Online tools like emails, news groups, blogs, and web forums provide an effective communication platform for millions of users around the globe and also provide an added advantage of anonymity. Millions of people post information on different web forums daily. The possibility of exchanging sensitive information between anonymous users on these web forums cannot be ruled out. This document proposes a two stage approach for combining unsupervised and supervised learning approaches for performing authorship attribution on web forum posts. During the first stage, the approach focuses on using clustering techniques to make an effort to group the data sets into stylistically similar clusters. The second stage involves using the resulting clusters from stage one as features to train different machine learning classifiers. This two stage approach is an effort towards reducing the complexity of the classification task and boosting the prediction accuracy.</description><identifier>ISSN: 2159-1237</identifier><identifier>ISBN: 9781424477609</identifier><identifier>ISBN: 1424477603</identifier><identifier>EISSN: 2159-1245</identifier><identifier>EISBN: 9781424477616</identifier><identifier>EISBN: 9781424477623</identifier><identifier>EISBN: 142447762X</identifier><identifier>EISBN: 1424477611</identifier><identifier>DOI: 10.1109/ecrime.2010.5706693</identifier><language>eng</language><publisher>IEEE</publisher><subject>Accuracy ; Authorship attribution ; Classification algorithms ; Classification tree analysis ; clustering ; Feature extraction ; Machine learning ; Machine learning algorithms ; machine learning classifiers ; stylometry ; text categorization ; Training</subject><ispartof>2010 eCrime Researchers Summit, 2010, p.1-7</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/5706693$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,776,780,785,786,2052,27902,54530,54895,54907</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/5706693$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Pillay, S R</creatorcontrib><creatorcontrib>Solorio, T</creatorcontrib><title>Authorship attribution of web forum posts</title><title>2010 eCrime Researchers Summit</title><addtitle>ecrime</addtitle><description>Extracting useful information from user generated text on the web is an important ongoing research in natural language processing, machine learning, and data mining. Online tools like emails, news groups, blogs, and web forums provide an effective communication platform for millions of users around the globe and also provide an added advantage of anonymity. Millions of people post information on different web forums daily. The possibility of exchanging sensitive information between anonymous users on these web forums cannot be ruled out. This document proposes a two stage approach for combining unsupervised and supervised learning approaches for performing authorship attribution on web forum posts. During the first stage, the approach focuses on using clustering techniques to make an effort to group the data sets into stylistically similar clusters. The second stage involves using the resulting clusters from stage one as features to train different machine learning classifiers. This two stage approach is an effort towards reducing the complexity of the classification task and boosting the prediction accuracy.</description><subject>Accuracy</subject><subject>Authorship attribution</subject><subject>Classification algorithms</subject><subject>Classification tree analysis</subject><subject>clustering</subject><subject>Feature extraction</subject><subject>Machine learning</subject><subject>Machine learning algorithms</subject><subject>machine learning classifiers</subject><subject>stylometry</subject><subject>text categorization</subject><subject>Training</subject><issn>2159-1237</issn><issn>2159-1245</issn><isbn>9781424477609</isbn><isbn>1424477603</isbn><isbn>9781424477616</isbn><isbn>9781424477623</isbn><isbn>142447762X</isbn><isbn>1424477611</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2010</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><recordid>eNpVj8tqwzAURNUXNKT-gmy87cKprnRl6S5D6AsC3WQfrIeJSlMZS6b072toKHRgGJgDA8PYCvgagNNDcGM8hbXgc6E0b1uSF6wibQAFotYttJdsIUBRAwLV1T_G6fqPSX3Lqpzf-SyFRki1YPebqRzTmI9xqLtSxminEtNnnfr6K9i6T-N0qoeUS75jN333kUN1ziXbPz3uty_N7u35dbvZNZF4aYwmpzx457FzUvYYeC86g2S8ddJzM9uCFQZBkNEoTYDgFKF31pJwcslWv7MxhHAY5uvd-H04_5Y_T5tIBA</recordid><startdate>201010</startdate><enddate>201010</enddate><creator>Pillay, S R</creator><creator>Solorio, T</creator><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope></search><sort><creationdate>201010</creationdate><title>Authorship attribution of web forum posts</title><author>Pillay, S R ; Solorio, T</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i90t-879c5d1dcd4ac33f4e0f2a8498dbc3d083d0b1b28412987438e1ec594dcbb92c3</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2010</creationdate><topic>Accuracy</topic><topic>Authorship attribution</topic><topic>Classification algorithms</topic><topic>Classification tree analysis</topic><topic>clustering</topic><topic>Feature extraction</topic><topic>Machine learning</topic><topic>Machine learning algorithms</topic><topic>machine learning classifiers</topic><topic>stylometry</topic><topic>text categorization</topic><topic>Training</topic><toplevel>online_resources</toplevel><creatorcontrib>Pillay, S R</creatorcontrib><creatorcontrib>Solorio, T</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Explore</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Pillay, S R</au><au>Solorio, T</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Authorship attribution of web forum posts</atitle><btitle>2010 eCrime Researchers Summit</btitle><stitle>ecrime</stitle><date>2010-10</date><risdate>2010</risdate><spage>1</spage><epage>7</epage><pages>1-7</pages><issn>2159-1237</issn><eissn>2159-1245</eissn><isbn>9781424477609</isbn><isbn>1424477603</isbn><eisbn>9781424477616</eisbn><eisbn>9781424477623</eisbn><eisbn>142447762X</eisbn><eisbn>1424477611</eisbn><abstract>Extracting useful information from user generated text on the web is an important ongoing research in natural language processing, machine learning, and data mining. Online tools like emails, news groups, blogs, and web forums provide an effective communication platform for millions of users around the globe and also provide an added advantage of anonymity. Millions of people post information on different web forums daily. The possibility of exchanging sensitive information between anonymous users on these web forums cannot be ruled out. This document proposes a two stage approach for combining unsupervised and supervised learning approaches for performing authorship attribution on web forum posts. During the first stage, the approach focuses on using clustering techniques to make an effort to group the data sets into stylistically similar clusters. The second stage involves using the resulting clusters from stage one as features to train different machine learning classifiers. This two stage approach is an effort towards reducing the complexity of the classification task and boosting the prediction accuracy.</abstract><pub>IEEE</pub><doi>10.1109/ecrime.2010.5706693</doi><tpages>7</tpages></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 2159-1237 |
ispartof | 2010 eCrime Researchers Summit, 2010, p.1-7 |
issn | 2159-1237 2159-1245 |
language | eng |
recordid | cdi_ieee_primary_5706693 |
source | IEEE Electronic Library (IEL) Conference Proceedings |
subjects | Accuracy Authorship attribution Classification algorithms Classification tree analysis clustering Feature extraction Machine learning Machine learning algorithms machine learning classifiers stylometry text categorization Training |
title | Authorship attribution of web forum posts |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-06T12%3A08%3A32IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Authorship%20attribution%20of%20web%20forum%20posts&rft.btitle=2010%20eCrime%20Researchers%20Summit&rft.au=Pillay,%20S%20R&rft.date=2010-10&rft.spage=1&rft.epage=7&rft.pages=1-7&rft.issn=2159-1237&rft.eissn=2159-1245&rft.isbn=9781424477609&rft.isbn_list=1424477603&rft_id=info:doi/10.1109/ecrime.2010.5706693&rft.eisbn=9781424477616&rft.eisbn_list=9781424477623&rft.eisbn_list=142447762X&rft.eisbn_list=1424477611&rft_dat=%3Cieee_6IE%3E5706693%3C/ieee_6IE%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-i90t-879c5d1dcd4ac33f4e0f2a8498dbc3d083d0b1b28412987438e1ec594dcbb92c3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=5706693&rfr_iscdi=true |