Loading…

Hierarchical Structures Induce Long-Range Dynamical Correlations in Written Texts

Thoughts and ideas are multidimensional and often concurrent, yet they can be expressed surprisingly well sequentially by the translation into language. This reduction of dimensions occurs naturally but requires memory and necessitates the existence of correlations, e.g., in written text. However, c...

Full description

Saved in:
Bibliographic Details
Published in:Proceedings of the National Academy of Sciences - PNAS 2006-05, Vol.103 (21), p.7956-7961
Main Authors: Alvarez-Lacalle, E., Dorow, B., Eckmann, J.-P., Moses, E.
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c563t-27fd2aaca85e64df246a035c52bc936113feb6207f44167130b4c015349e516e3
cites cdi_FETCH-LOGICAL-c563t-27fd2aaca85e64df246a035c52bc936113feb6207f44167130b4c015349e516e3
container_end_page 7961
container_issue 21
container_start_page 7956
container_title Proceedings of the National Academy of Sciences - PNAS
container_volume 103
creator Alvarez-Lacalle, E.
Dorow, B.
Eckmann, J.-P.
Moses, E.
description Thoughts and ideas are multidimensional and often concurrent, yet they can be expressed surprisingly well sequentially by the translation into language. This reduction of dimensions occurs naturally but requires memory and necessitates the existence of correlations, e.g., in written text. However, correlations in word appearance decay quickly, while previous observations of long-range correlations using random walk approaches yield little insight on memory or on semantic context. Instead, we study combinations of words that a reader is exposed to within a "window of attention," spanning about 100 words. We define a vector space of such word combinations by looking at words that co-occur within the window of attention, and analyze its structure. Singular value decomposition of the co-occurrence matrix identifies a basis whose vectors correspond to specific topics, or "concepts" that are relevant to the text. As the reader follows a text, the "vector of attention" traces out a trajectory of directions in this "concept space." We find that memory of the direction is retained over long times, forming power-law correlations. The appearance of power laws hints at the existence of an underlying hierarchical network. Indeed, imposing a hierarchy similar to that defined by volumes, chapters, paragraphs, etc. succeeds in creating correlations in a surrogate random text that are identical to those of the original text. We conclude that hierarchical structures in text serve to create long-range correlations, and use the reader's memory in reenacting some of the multidimensionality of the thoughts being expressed.
doi_str_mv 10.1073/pnas.0510673103
format article
fullrecord <record><control><sourceid>jstor_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_1472411</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><jstor_id>30049156</jstor_id><sourcerecordid>30049156</sourcerecordid><originalsourceid>FETCH-LOGICAL-c563t-27fd2aaca85e64df246a035c52bc936113feb6207f44167130b4c015349e516e3</originalsourceid><addsrcrecordid>eNpdkc1vEzEQxa0K1KahZ06gFQdu244_d31BQimllSIhoKhHy3FmU0cbO9he1P733TZRC5zm8H7zNG8eIW8pnFJo-Nk22HwKkoJqOAV-QCYUNK2V0PCKTABYU7eCiSNynPMaALRs4ZAcUaV0qzmfkO-XHpNN7tY721c_SxpcGRLm6iosB4fVPIZV_cOGFVbn98FunrBZTAl7W3wMufKhukm-FAzVNd6V_Ia87myf8WQ_p-TXxZfr2WU9__b1avZ5XjupeKlZ0y2Ztc62EpVYdkwoC1w6yRZOc0Up73ChGDSdEFQ1lMNCOKCSC42SKuRT8mnnux0WG1w6DCXZ3myT39h0b6L15l8l-Fuzin8MFQ0To_-UfNwbpPh7wFzMxmeHfW8DxiEb1QIoLcUIfvgPXMchhTGcYUC5BsXYCJ3tIJdizgm750somMeuzGNX5qWrceP93wFe-H05I_BuB6xzielZ5wBC0_GJD5OQmgM</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>201390622</pqid></control><display><type>article</type><title>Hierarchical Structures Induce Long-Range Dynamical Correlations in Written Texts</title><source>Open Access: PubMed Central</source><source>JSTOR Archival Journals and Primary Sources Collection</source><creator>Alvarez-Lacalle, E. ; Dorow, B. ; Eckmann, J.-P. ; Moses, E.</creator><creatorcontrib>Alvarez-Lacalle, E. ; Dorow, B. ; Eckmann, J.-P. ; Moses, E.</creatorcontrib><description>Thoughts and ideas are multidimensional and often concurrent, yet they can be expressed surprisingly well sequentially by the translation into language. This reduction of dimensions occurs naturally but requires memory and necessitates the existence of correlations, e.g., in written text. However, correlations in word appearance decay quickly, while previous observations of long-range correlations using random walk approaches yield little insight on memory or on semantic context. Instead, we study combinations of words that a reader is exposed to within a "window of attention," spanning about 100 words. We define a vector space of such word combinations by looking at words that co-occur within the window of attention, and analyze its structure. Singular value decomposition of the co-occurrence matrix identifies a basis whose vectors correspond to specific topics, or "concepts" that are relevant to the text. As the reader follows a text, the "vector of attention" traces out a trajectory of directions in this "concept space." We find that memory of the direction is retained over long times, forming power-law correlations. The appearance of power laws hints at the existence of an underlying hierarchical network. Indeed, imposing a hierarchy similar to that defined by volumes, chapters, paragraphs, etc. succeeds in creating correlations in a surrogate random text that are identical to those of the original text. We conclude that hierarchical structures in text serve to create long-range correlations, and use the reader's memory in reenacting some of the multidimensionality of the thoughts being expressed.</description><identifier>ISSN: 0027-8424</identifier><identifier>EISSN: 1091-6490</identifier><identifier>DOI: 10.1073/pnas.0510673103</identifier><identifier>PMID: 16698933</identifier><language>eng</language><publisher>United States: National Academy of Sciences</publisher><subject>Autocorrelation ; Connectivity ; Correlations ; Language ; Mathematical vectors ; Mathematics ; Matrices ; Memory ; Models, Statistical ; Nouns ; Physical Sciences ; Power laws ; Reading comprehension ; Semantics ; Systems Analysis ; Trajectories ; Vector spaces ; Vocabulary ; Words</subject><ispartof>Proceedings of the National Academy of Sciences - PNAS, 2006-05, Vol.103 (21), p.7956-7961</ispartof><rights>Copyright 2006 National Academy of Sciences of the United States of America</rights><rights>Copyright National Academy of Sciences May 23, 2006</rights><rights>2006 by The National Academy of Sciences of the USA 2006</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c563t-27fd2aaca85e64df246a035c52bc936113feb6207f44167130b4c015349e516e3</citedby><cites>FETCH-LOGICAL-c563t-27fd2aaca85e64df246a035c52bc936113feb6207f44167130b4c015349e516e3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.jstor.org/stable/pdf/30049156$$EPDF$$P50$$Gjstor$$H</linktopdf><linktohtml>$$Uhttps://www.jstor.org/stable/30049156$$EHTML$$P50$$Gjstor$$H</linktohtml><link.rule.ids>230,314,727,780,784,885,27924,27925,53791,53793,58238,58471</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/16698933$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Alvarez-Lacalle, E.</creatorcontrib><creatorcontrib>Dorow, B.</creatorcontrib><creatorcontrib>Eckmann, J.-P.</creatorcontrib><creatorcontrib>Moses, E.</creatorcontrib><title>Hierarchical Structures Induce Long-Range Dynamical Correlations in Written Texts</title><title>Proceedings of the National Academy of Sciences - PNAS</title><addtitle>Proc Natl Acad Sci U S A</addtitle><description>Thoughts and ideas are multidimensional and often concurrent, yet they can be expressed surprisingly well sequentially by the translation into language. This reduction of dimensions occurs naturally but requires memory and necessitates the existence of correlations, e.g., in written text. However, correlations in word appearance decay quickly, while previous observations of long-range correlations using random walk approaches yield little insight on memory or on semantic context. Instead, we study combinations of words that a reader is exposed to within a "window of attention," spanning about 100 words. We define a vector space of such word combinations by looking at words that co-occur within the window of attention, and analyze its structure. Singular value decomposition of the co-occurrence matrix identifies a basis whose vectors correspond to specific topics, or "concepts" that are relevant to the text. As the reader follows a text, the "vector of attention" traces out a trajectory of directions in this "concept space." We find that memory of the direction is retained over long times, forming power-law correlations. The appearance of power laws hints at the existence of an underlying hierarchical network. Indeed, imposing a hierarchy similar to that defined by volumes, chapters, paragraphs, etc. succeeds in creating correlations in a surrogate random text that are identical to those of the original text. We conclude that hierarchical structures in text serve to create long-range correlations, and use the reader's memory in reenacting some of the multidimensionality of the thoughts being expressed.</description><subject>Autocorrelation</subject><subject>Connectivity</subject><subject>Correlations</subject><subject>Language</subject><subject>Mathematical vectors</subject><subject>Mathematics</subject><subject>Matrices</subject><subject>Memory</subject><subject>Models, Statistical</subject><subject>Nouns</subject><subject>Physical Sciences</subject><subject>Power laws</subject><subject>Reading comprehension</subject><subject>Semantics</subject><subject>Systems Analysis</subject><subject>Trajectories</subject><subject>Vector spaces</subject><subject>Vocabulary</subject><subject>Words</subject><issn>0027-8424</issn><issn>1091-6490</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2006</creationdate><recordtype>article</recordtype><recordid>eNpdkc1vEzEQxa0K1KahZ06gFQdu244_d31BQimllSIhoKhHy3FmU0cbO9he1P733TZRC5zm8H7zNG8eIW8pnFJo-Nk22HwKkoJqOAV-QCYUNK2V0PCKTABYU7eCiSNynPMaALRs4ZAcUaV0qzmfkO-XHpNN7tY721c_SxpcGRLm6iosB4fVPIZV_cOGFVbn98FunrBZTAl7W3wMufKhukm-FAzVNd6V_Ia87myf8WQ_p-TXxZfr2WU9__b1avZ5XjupeKlZ0y2Ztc62EpVYdkwoC1w6yRZOc0Up73ChGDSdEFQ1lMNCOKCSC42SKuRT8mnnux0WG1w6DCXZ3myT39h0b6L15l8l-Fuzin8MFQ0To_-UfNwbpPh7wFzMxmeHfW8DxiEb1QIoLcUIfvgPXMchhTGcYUC5BsXYCJ3tIJdizgm750somMeuzGNX5qWrceP93wFe-H05I_BuB6xzielZ5wBC0_GJD5OQmgM</recordid><startdate>20060523</startdate><enddate>20060523</enddate><creator>Alvarez-Lacalle, E.</creator><creator>Dorow, B.</creator><creator>Eckmann, J.-P.</creator><creator>Moses, E.</creator><general>National Academy of Sciences</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7QG</scope><scope>7QL</scope><scope>7QP</scope><scope>7QR</scope><scope>7SN</scope><scope>7SS</scope><scope>7T5</scope><scope>7TK</scope><scope>7TM</scope><scope>7TO</scope><scope>7U9</scope><scope>8FD</scope><scope>C1K</scope><scope>FR3</scope><scope>H94</scope><scope>M7N</scope><scope>P64</scope><scope>RC3</scope><scope>7X8</scope><scope>5PM</scope></search><sort><creationdate>20060523</creationdate><title>Hierarchical Structures Induce Long-Range Dynamical Correlations in Written Texts</title><author>Alvarez-Lacalle, E. ; Dorow, B. ; Eckmann, J.-P. ; Moses, E.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c563t-27fd2aaca85e64df246a035c52bc936113feb6207f44167130b4c015349e516e3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2006</creationdate><topic>Autocorrelation</topic><topic>Connectivity</topic><topic>Correlations</topic><topic>Language</topic><topic>Mathematical vectors</topic><topic>Mathematics</topic><topic>Matrices</topic><topic>Memory</topic><topic>Models, Statistical</topic><topic>Nouns</topic><topic>Physical Sciences</topic><topic>Power laws</topic><topic>Reading comprehension</topic><topic>Semantics</topic><topic>Systems Analysis</topic><topic>Trajectories</topic><topic>Vector spaces</topic><topic>Vocabulary</topic><topic>Words</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Alvarez-Lacalle, E.</creatorcontrib><creatorcontrib>Dorow, B.</creatorcontrib><creatorcontrib>Eckmann, J.-P.</creatorcontrib><creatorcontrib>Moses, E.</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Animal Behavior Abstracts</collection><collection>Bacteriology Abstracts (Microbiology B)</collection><collection>Calcium &amp; Calcified Tissue Abstracts</collection><collection>Chemoreception Abstracts</collection><collection>Ecology Abstracts</collection><collection>Entomology Abstracts (Full archive)</collection><collection>Immunology Abstracts</collection><collection>Neurosciences Abstracts</collection><collection>Nucleic Acids Abstracts</collection><collection>Oncogenes and Growth Factors Abstracts</collection><collection>Virology and AIDS Abstracts</collection><collection>Technology Research Database</collection><collection>Environmental Sciences and Pollution Management</collection><collection>Engineering Research Database</collection><collection>AIDS and Cancer Research Abstracts</collection><collection>Algology Mycology and Protozoology Abstracts (Microbiology C)</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Proceedings of the National Academy of Sciences - PNAS</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Alvarez-Lacalle, E.</au><au>Dorow, B.</au><au>Eckmann, J.-P.</au><au>Moses, E.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Hierarchical Structures Induce Long-Range Dynamical Correlations in Written Texts</atitle><jtitle>Proceedings of the National Academy of Sciences - PNAS</jtitle><addtitle>Proc Natl Acad Sci U S A</addtitle><date>2006-05-23</date><risdate>2006</risdate><volume>103</volume><issue>21</issue><spage>7956</spage><epage>7961</epage><pages>7956-7961</pages><issn>0027-8424</issn><eissn>1091-6490</eissn><abstract>Thoughts and ideas are multidimensional and often concurrent, yet they can be expressed surprisingly well sequentially by the translation into language. This reduction of dimensions occurs naturally but requires memory and necessitates the existence of correlations, e.g., in written text. However, correlations in word appearance decay quickly, while previous observations of long-range correlations using random walk approaches yield little insight on memory or on semantic context. Instead, we study combinations of words that a reader is exposed to within a "window of attention," spanning about 100 words. We define a vector space of such word combinations by looking at words that co-occur within the window of attention, and analyze its structure. Singular value decomposition of the co-occurrence matrix identifies a basis whose vectors correspond to specific topics, or "concepts" that are relevant to the text. As the reader follows a text, the "vector of attention" traces out a trajectory of directions in this "concept space." We find that memory of the direction is retained over long times, forming power-law correlations. The appearance of power laws hints at the existence of an underlying hierarchical network. Indeed, imposing a hierarchy similar to that defined by volumes, chapters, paragraphs, etc. succeeds in creating correlations in a surrogate random text that are identical to those of the original text. We conclude that hierarchical structures in text serve to create long-range correlations, and use the reader's memory in reenacting some of the multidimensionality of the thoughts being expressed.</abstract><cop>United States</cop><pub>National Academy of Sciences</pub><pmid>16698933</pmid><doi>10.1073/pnas.0510673103</doi><tpages>6</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 0027-8424
ispartof Proceedings of the National Academy of Sciences - PNAS, 2006-05, Vol.103 (21), p.7956-7961
issn 0027-8424
1091-6490
language eng
recordid cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_1472411
source Open Access: PubMed Central; JSTOR Archival Journals and Primary Sources Collection
subjects Autocorrelation
Connectivity
Correlations
Language
Mathematical vectors
Mathematics
Matrices
Memory
Models, Statistical
Nouns
Physical Sciences
Power laws
Reading comprehension
Semantics
Systems Analysis
Trajectories
Vector spaces
Vocabulary
Words
title Hierarchical Structures Induce Long-Range Dynamical Correlations in Written Texts
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-24T12%3A37%3A53IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-jstor_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Hierarchical%20Structures%20Induce%20Long-Range%20Dynamical%20Correlations%20in%20Written%20Texts&rft.jtitle=Proceedings%20of%20the%20National%20Academy%20of%20Sciences%20-%20PNAS&rft.au=Alvarez-Lacalle,%20E.&rft.date=2006-05-23&rft.volume=103&rft.issue=21&rft.spage=7956&rft.epage=7961&rft.pages=7956-7961&rft.issn=0027-8424&rft.eissn=1091-6490&rft_id=info:doi/10.1073/pnas.0510673103&rft_dat=%3Cjstor_pubme%3E30049156%3C/jstor_pubme%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c563t-27fd2aaca85e64df246a035c52bc936113feb6207f44167130b4c015349e516e3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=201390622&rft_id=info:pmid/16698933&rft_jstor_id=30049156&rfr_iscdi=true