Loading…

Using complex networks for text classification: Discriminating informative and imaginative documents

Statistical methods have been widely employed in recent years to grasp many language properties. The application of such techniques have allowed an improvement of several linguistic applications, such as machine translation and document classification. In the latter, many approaches have emphasised...

Full description

Saved in:
Bibliographic Details
Published in:Europhysics letters 2016-01, Vol.113 (2), p.28007-28007
Main Authors: de Arruda, Henrique F., Costa, Luciano da F., Amancio, Diego R.
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c486t-4d87ebdc13f32c75bbe4394bf4d1ae7835f2a189a244d18f39e6d644aeff03f43
cites cdi_FETCH-LOGICAL-c486t-4d87ebdc13f32c75bbe4394bf4d1ae7835f2a189a244d18f39e6d644aeff03f43
container_end_page 28007
container_issue 2
container_start_page 28007
container_title Europhysics letters
container_volume 113
creator de Arruda, Henrique F.
Costa, Luciano da F.
Amancio, Diego R.
description Statistical methods have been widely employed in recent years to grasp many language properties. The application of such techniques have allowed an improvement of several linguistic applications, such as machine translation and document classification. In the latter, many approaches have emphasised the semantical content of texts, as is the case of bag-of-word language models. These approaches have certainly yielded reasonable performance. However, some potential features such as the structural organization of texts have been used only in a few studies. In this context, we probe how features derived from textual structure analysis can be effectively employed in a classification task. More specifically, we performed a supervised classification aiming at discriminating informative from imaginative documents. Using a networked model that describes the local topological/dynamical properties of function words, we achieved an accuracy rate of up to 95%, which is much higher than similar networked approaches. A systematic analysis of feature relevance revealed that symmetry and accessibility measurements are among the most prominent network measurements. Our results suggest that these measurements could be used in related language applications, as they play a complementary role in characterising texts.
doi_str_mv 10.1209/0295-5075/113/28007
format article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_1835585873</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3110112391</sourcerecordid><originalsourceid>FETCH-LOGICAL-c486t-4d87ebdc13f32c75bbe4394bf4d1ae7835f2a189a244d18f39e6d644aeff03f43</originalsourceid><addsrcrecordid>eNp9kElLxTAUhYMo-Bx-gZuCC93Ul6lt6k6cwWGh4jLkpYlE26QmfQ7_3vusKIi4Su7Ndw4nB6EtgvcIxfUU07rIC1wVU0LYlAqMqyU0IVSUORcFX0aTb2IVraX0iDEhgpQT1Nwl5x8yHbq-NW-ZN8NriE8psyFmg3kbMt2qlJx1Wg0u-P3syCUdXec8zCB0HsgO7i8mU77JXKcePt9gboKed8YPaQOtWNUms_l1rqO7k-Pbw7P84vr0_PDgItdclEPOG1GZWaMJs4zqqpjNDGc1n1neEGUqwQpLFRG1ohw2wrLalE3JuTLWYmY5W0e7o28fw_PcpEF2kNa0rfImzJMkYFGIQlQM0O1f6GOYRw_pJCME2qGsJkCxkdIxpBSNlT38XcV3SbBcNC8XvcpFrxKal5_NgyofVS5Bhd8SFZ9kWTFABb6XV1f08qRkN1IAv_PFh_4nhunb0XN0lX1jgZz-Qf6X5QMrUaHf</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3110112391</pqid></control><display><type>article</type><title>Using complex networks for text classification: Discriminating informative and imaginative documents</title><source>Institute of Physics</source><creator>de Arruda, Henrique F. ; Costa, Luciano da F. ; Amancio, Diego R.</creator><creatorcontrib>de Arruda, Henrique F. ; Costa, Luciano da F. ; Amancio, Diego R.</creatorcontrib><description>Statistical methods have been widely employed in recent years to grasp many language properties. The application of such techniques have allowed an improvement of several linguistic applications, such as machine translation and document classification. In the latter, many approaches have emphasised the semantical content of texts, as is the case of bag-of-word language models. These approaches have certainly yielded reasonable performance. However, some potential features such as the structural organization of texts have been used only in a few studies. In this context, we probe how features derived from textual structure analysis can be effectively employed in a classification task. More specifically, we performed a supervised classification aiming at discriminating informative from imaginative documents. Using a networked model that describes the local topological/dynamical properties of function words, we achieved an accuracy rate of up to 95%, which is much higher than similar networked approaches. A systematic analysis of feature relevance revealed that symmetry and accessibility measurements are among the most prominent network measurements. Our results suggest that these measurements could be used in related language applications, as they play a complementary role in characterising texts.</description><identifier>ISSN: 0295-5075</identifier><identifier>EISSN: 1286-4854</identifier><identifier>DOI: 10.1209/0295-5075/113/28007</identifier><identifier>CODEN: EULEEJ</identifier><language>eng</language><publisher>Les Ulis: EDP Sciences, IOP Publishing and Società Italiana di Fisica</publisher><subject>89.75.Fb ; 89.75.Hc ; Accessibility ; Classification ; Documents ; Linguistics ; Machine translation ; Mathematical models ; Networks ; Statistical methods ; Structural analysis ; Tasks ; Texts</subject><ispartof>Europhysics letters, 2016-01, Vol.113 (2), p.28007-28007</ispartof><rights>Copyright © EPLA, 2016</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c486t-4d87ebdc13f32c75bbe4394bf4d1ae7835f2a189a244d18f39e6d644aeff03f43</citedby><cites>FETCH-LOGICAL-c486t-4d87ebdc13f32c75bbe4394bf4d1ae7835f2a189a244d18f39e6d644aeff03f43</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27922,27923</link.rule.ids></links><search><creatorcontrib>de Arruda, Henrique F.</creatorcontrib><creatorcontrib>Costa, Luciano da F.</creatorcontrib><creatorcontrib>Amancio, Diego R.</creatorcontrib><title>Using complex networks for text classification: Discriminating informative and imaginative documents</title><title>Europhysics letters</title><addtitle>EPL</addtitle><addtitle>EPL</addtitle><description>Statistical methods have been widely employed in recent years to grasp many language properties. The application of such techniques have allowed an improvement of several linguistic applications, such as machine translation and document classification. In the latter, many approaches have emphasised the semantical content of texts, as is the case of bag-of-word language models. These approaches have certainly yielded reasonable performance. However, some potential features such as the structural organization of texts have been used only in a few studies. In this context, we probe how features derived from textual structure analysis can be effectively employed in a classification task. More specifically, we performed a supervised classification aiming at discriminating informative from imaginative documents. Using a networked model that describes the local topological/dynamical properties of function words, we achieved an accuracy rate of up to 95%, which is much higher than similar networked approaches. A systematic analysis of feature relevance revealed that symmetry and accessibility measurements are among the most prominent network measurements. Our results suggest that these measurements could be used in related language applications, as they play a complementary role in characterising texts.</description><subject>89.75.Fb</subject><subject>89.75.Hc</subject><subject>Accessibility</subject><subject>Classification</subject><subject>Documents</subject><subject>Linguistics</subject><subject>Machine translation</subject><subject>Mathematical models</subject><subject>Networks</subject><subject>Statistical methods</subject><subject>Structural analysis</subject><subject>Tasks</subject><subject>Texts</subject><issn>0295-5075</issn><issn>1286-4854</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2016</creationdate><recordtype>article</recordtype><recordid>eNp9kElLxTAUhYMo-Bx-gZuCC93Ul6lt6k6cwWGh4jLkpYlE26QmfQ7_3vusKIi4Su7Ndw4nB6EtgvcIxfUU07rIC1wVU0LYlAqMqyU0IVSUORcFX0aTb2IVraX0iDEhgpQT1Nwl5x8yHbq-NW-ZN8NriE8psyFmg3kbMt2qlJx1Wg0u-P3syCUdXec8zCB0HsgO7i8mU77JXKcePt9gboKed8YPaQOtWNUms_l1rqO7k-Pbw7P84vr0_PDgItdclEPOG1GZWaMJs4zqqpjNDGc1n1neEGUqwQpLFRG1ohw2wrLalE3JuTLWYmY5W0e7o28fw_PcpEF2kNa0rfImzJMkYFGIQlQM0O1f6GOYRw_pJCME2qGsJkCxkdIxpBSNlT38XcV3SbBcNC8XvcpFrxKal5_NgyofVS5Bhd8SFZ9kWTFABb6XV1f08qRkN1IAv_PFh_4nhunb0XN0lX1jgZz-Qf6X5QMrUaHf</recordid><startdate>201601</startdate><enddate>201601</enddate><creator>de Arruda, Henrique F.</creator><creator>Costa, Luciano da F.</creator><creator>Amancio, Diego R.</creator><general>EDP Sciences, IOP Publishing and Società Italiana di Fisica</general><general>IOP Publishing</general><scope>BSCLL</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7U5</scope><scope>8FD</scope><scope>H8D</scope><scope>L7M</scope></search><sort><creationdate>201601</creationdate><title>Using complex networks for text classification: Discriminating informative and imaginative documents</title><author>de Arruda, Henrique F. ; Costa, Luciano da F. ; Amancio, Diego R.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c486t-4d87ebdc13f32c75bbe4394bf4d1ae7835f2a189a244d18f39e6d644aeff03f43</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2016</creationdate><topic>89.75.Fb</topic><topic>89.75.Hc</topic><topic>Accessibility</topic><topic>Classification</topic><topic>Documents</topic><topic>Linguistics</topic><topic>Machine translation</topic><topic>Mathematical models</topic><topic>Networks</topic><topic>Statistical methods</topic><topic>Structural analysis</topic><topic>Tasks</topic><topic>Texts</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>de Arruda, Henrique F.</creatorcontrib><creatorcontrib>Costa, Luciano da F.</creatorcontrib><creatorcontrib>Amancio, Diego R.</creatorcontrib><collection>Istex</collection><collection>CrossRef</collection><collection>Solid State and Superconductivity Abstracts</collection><collection>Technology Research Database</collection><collection>Aerospace Database</collection><collection>Advanced Technologies Database with Aerospace</collection><jtitle>Europhysics letters</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>de Arruda, Henrique F.</au><au>Costa, Luciano da F.</au><au>Amancio, Diego R.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Using complex networks for text classification: Discriminating informative and imaginative documents</atitle><jtitle>Europhysics letters</jtitle><stitle>EPL</stitle><addtitle>EPL</addtitle><date>2016-01</date><risdate>2016</risdate><volume>113</volume><issue>2</issue><spage>28007</spage><epage>28007</epage><pages>28007-28007</pages><issn>0295-5075</issn><eissn>1286-4854</eissn><coden>EULEEJ</coden><abstract>Statistical methods have been widely employed in recent years to grasp many language properties. The application of such techniques have allowed an improvement of several linguistic applications, such as machine translation and document classification. In the latter, many approaches have emphasised the semantical content of texts, as is the case of bag-of-word language models. These approaches have certainly yielded reasonable performance. However, some potential features such as the structural organization of texts have been used only in a few studies. In this context, we probe how features derived from textual structure analysis can be effectively employed in a classification task. More specifically, we performed a supervised classification aiming at discriminating informative from imaginative documents. Using a networked model that describes the local topological/dynamical properties of function words, we achieved an accuracy rate of up to 95%, which is much higher than similar networked approaches. A systematic analysis of feature relevance revealed that symmetry and accessibility measurements are among the most prominent network measurements. Our results suggest that these measurements could be used in related language applications, as they play a complementary role in characterising texts.</abstract><cop>Les Ulis</cop><pub>EDP Sciences, IOP Publishing and Società Italiana di Fisica</pub><doi>10.1209/0295-5075/113/28007</doi><tpages>6</tpages></addata></record>
fulltext fulltext
identifier ISSN: 0295-5075
ispartof Europhysics letters, 2016-01, Vol.113 (2), p.28007-28007
issn 0295-5075
1286-4854
language eng
recordid cdi_proquest_miscellaneous_1835585873
source Institute of Physics
subjects 89.75.Fb
89.75.Hc
Accessibility
Classification
Documents
Linguistics
Machine translation
Mathematical models
Networks
Statistical methods
Structural analysis
Tasks
Texts
title Using complex networks for text classification: Discriminating informative and imaginative documents
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-14T09%3A13%3A22IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Using%20complex%20networks%20for%20text%20classification:%20Discriminating%20informative%20and%20imaginative%20documents&rft.jtitle=Europhysics%20letters&rft.au=de%20Arruda,%20Henrique%20F.&rft.date=2016-01&rft.volume=113&rft.issue=2&rft.spage=28007&rft.epage=28007&rft.pages=28007-28007&rft.issn=0295-5075&rft.eissn=1286-4854&rft.coden=EULEEJ&rft_id=info:doi/10.1209/0295-5075/113/28007&rft_dat=%3Cproquest_cross%3E3110112391%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c486t-4d87ebdc13f32c75bbe4394bf4d1ae7835f2a189a244d18f39e6d644aeff03f43%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=3110112391&rft_id=info:pmid/&rfr_iscdi=true