Loading…
Lexical analysis of automated accounts on Twitter
In recent years, social bots have been using increasingly more sophisticated, challenging detection strategies. While many approaches and features have been proposed, social bots evade detection and interact much like humans making it difficult to distinguish real human accounts from bot accounts. F...
Saved in:
Published in: | arXiv.org 2018-12 |
---|---|
Main Authors: | , , |
Format: | Article |
Language: | English |
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | |
---|---|
cites | |
container_end_page | |
container_issue | |
container_start_page | |
container_title | arXiv.org |
container_volume | |
creator | Isa Inuwa-Dutse Bello, Shehu Bello Korkontzelos, Ioannis |
description | In recent years, social bots have been using increasingly more sophisticated, challenging detection strategies. While many approaches and features have been proposed, social bots evade detection and interact much like humans making it difficult to distinguish real human accounts from bot accounts. For detection systems, various features under the broader categories of account profile, tweet content, network and temporal pattern have been utilised. The use of tweet content features is limited to analysis of basic terms such as URLs, hashtags, name entities and sentiment. Given a set of tweet contents with no obvious pattern can we distinguish contents produced by social bots from that of humans? We aim to answer this question by analysing the lexical richness of tweets produced by the respective accounts using large collections of different datasets. Our results show a clear margin between the two classes in lexical diversity, lexical sophistication and distribution of emoticons. We found that the proposed lexical features significantly improve the performance of classifying both account types. These features are useful for training a standard machine learning classifier for effective detection of social bot accounts. A new dataset is made freely available for further exploration. |
format | article |
fullrecord | <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2159026124</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2159026124</sourcerecordid><originalsourceid>FETCH-proquest_journals_21590261243</originalsourceid><addsrcrecordid>eNpjYuA0MjY21LUwMTLiYOAtLs4yMDAwMjM3MjU15mQw9EmtyExOzFFIzEvMqSzOLFbIT1NILC3Jz00sSU1RSExOzi_NKwGK5imElGeWlKQW8TCwpiXmFKfyQmluBmU31xBnD92CovzC0tTikvis_NIioGnF8UaGppZAqwyNTIyJUwUAvZczpQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2159026124</pqid></control><display><type>article</type><title>Lexical analysis of automated accounts on Twitter</title><source>Publicly Available Content Database</source><creator>Isa Inuwa-Dutse ; Bello, Shehu Bello ; Korkontzelos, Ioannis</creator><creatorcontrib>Isa Inuwa-Dutse ; Bello, Shehu Bello ; Korkontzelos, Ioannis</creatorcontrib><description>In recent years, social bots have been using increasingly more sophisticated, challenging detection strategies. While many approaches and features have been proposed, social bots evade detection and interact much like humans making it difficult to distinguish real human accounts from bot accounts. For detection systems, various features under the broader categories of account profile, tweet content, network and temporal pattern have been utilised. The use of tweet content features is limited to analysis of basic terms such as URLs, hashtags, name entities and sentiment. Given a set of tweet contents with no obvious pattern can we distinguish contents produced by social bots from that of humans? We aim to answer this question by analysing the lexical richness of tweets produced by the respective accounts using large collections of different datasets. Our results show a clear margin between the two classes in lexical diversity, lexical sophistication and distribution of emoticons. We found that the proposed lexical features significantly improve the performance of classifying both account types. These features are useful for training a standard machine learning classifier for effective detection of social bot accounts. A new dataset is made freely available for further exploration.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Machine learning ; Performance enhancement ; Software agents</subject><ispartof>arXiv.org, 2018-12</ispartof><rights>2018. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/2159026124?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>780,784,25751,37010,44588</link.rule.ids></links><search><creatorcontrib>Isa Inuwa-Dutse</creatorcontrib><creatorcontrib>Bello, Shehu Bello</creatorcontrib><creatorcontrib>Korkontzelos, Ioannis</creatorcontrib><title>Lexical analysis of automated accounts on Twitter</title><title>arXiv.org</title><description>In recent years, social bots have been using increasingly more sophisticated, challenging detection strategies. While many approaches and features have been proposed, social bots evade detection and interact much like humans making it difficult to distinguish real human accounts from bot accounts. For detection systems, various features under the broader categories of account profile, tweet content, network and temporal pattern have been utilised. The use of tweet content features is limited to analysis of basic terms such as URLs, hashtags, name entities and sentiment. Given a set of tweet contents with no obvious pattern can we distinguish contents produced by social bots from that of humans? We aim to answer this question by analysing the lexical richness of tweets produced by the respective accounts using large collections of different datasets. Our results show a clear margin between the two classes in lexical diversity, lexical sophistication and distribution of emoticons. We found that the proposed lexical features significantly improve the performance of classifying both account types. These features are useful for training a standard machine learning classifier for effective detection of social bot accounts. A new dataset is made freely available for further exploration.</description><subject>Machine learning</subject><subject>Performance enhancement</subject><subject>Software agents</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><recordid>eNpjYuA0MjY21LUwMTLiYOAtLs4yMDAwMjM3MjU15mQw9EmtyExOzFFIzEvMqSzOLFbIT1NILC3Jz00sSU1RSExOzi_NKwGK5imElGeWlKQW8TCwpiXmFKfyQmluBmU31xBnD92CovzC0tTikvis_NIioGnF8UaGppZAqwyNTIyJUwUAvZczpQ</recordid><startdate>20181219</startdate><enddate>20181219</enddate><creator>Isa Inuwa-Dutse</creator><creator>Bello, Shehu Bello</creator><creator>Korkontzelos, Ioannis</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20181219</creationdate><title>Lexical analysis of automated accounts on Twitter</title><author>Isa Inuwa-Dutse ; Bello, Shehu Bello ; Korkontzelos, Ioannis</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_21590261243</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2018</creationdate><topic>Machine learning</topic><topic>Performance enhancement</topic><topic>Software agents</topic><toplevel>online_resources</toplevel><creatorcontrib>Isa Inuwa-Dutse</creatorcontrib><creatorcontrib>Bello, Shehu Bello</creatorcontrib><creatorcontrib>Korkontzelos, Ioannis</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection (Proquest) (PQ_SDU_P3)</collection><collection>ProQuest Engineering Collection</collection><collection>ProQuest Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Isa Inuwa-Dutse</au><au>Bello, Shehu Bello</au><au>Korkontzelos, Ioannis</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Lexical analysis of automated accounts on Twitter</atitle><jtitle>arXiv.org</jtitle><date>2018-12-19</date><risdate>2018</risdate><eissn>2331-8422</eissn><abstract>In recent years, social bots have been using increasingly more sophisticated, challenging detection strategies. While many approaches and features have been proposed, social bots evade detection and interact much like humans making it difficult to distinguish real human accounts from bot accounts. For detection systems, various features under the broader categories of account profile, tweet content, network and temporal pattern have been utilised. The use of tweet content features is limited to analysis of basic terms such as URLs, hashtags, name entities and sentiment. Given a set of tweet contents with no obvious pattern can we distinguish contents produced by social bots from that of humans? We aim to answer this question by analysing the lexical richness of tweets produced by the respective accounts using large collections of different datasets. Our results show a clear margin between the two classes in lexical diversity, lexical sophistication and distribution of emoticons. We found that the proposed lexical features significantly improve the performance of classifying both account types. These features are useful for training a standard machine learning classifier for effective detection of social bot accounts. A new dataset is made freely available for further exploration.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | EISSN: 2331-8422 |
ispartof | arXiv.org, 2018-12 |
issn | 2331-8422 |
language | eng |
recordid | cdi_proquest_journals_2159026124 |
source | Publicly Available Content Database |
subjects | Machine learning Performance enhancement Software agents |
title | Lexical analysis of automated accounts on Twitter |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-14T10%3A37%3A56IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Lexical%20analysis%20of%20automated%20accounts%20on%20Twitter&rft.jtitle=arXiv.org&rft.au=Isa%20Inuwa-Dutse&rft.date=2018-12-19&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2159026124%3C/proquest%3E%3Cgrp_id%3Ecdi_FETCH-proquest_journals_21590261243%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2159026124&rft_id=info:pmid/&rfr_iscdi=true |