Loading…
Evaluating linguistic distance measures
In Ref. [13], Petroni and Serva discuss the use of Levenshtein distances (LD) between words referring to the same concepts as a tool for establishing overall distances among languages which can then subsequently be used to derive phylogenies. The authors modify the raw LD by dividing the LD by the...
Saved in:
Published in: | Physica A 2010-09, Vol.389 (17), p.3632-3639 |
---|---|
Main Authors: | , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | cdi_FETCH-LOGICAL-c335t-1b1abec6405558d53de3902b561f6e89969c57c2030df9afa979a11936a882123 |
---|---|
cites | cdi_FETCH-LOGICAL-c335t-1b1abec6405558d53de3902b561f6e89969c57c2030df9afa979a11936a882123 |
container_end_page | 3639 |
container_issue | 17 |
container_start_page | 3632 |
container_title | Physica A |
container_volume | 389 |
creator | Wichmann, Søren Holman, Eric W. Bakker, Dik Brown, Cecil H. |
description | In Ref.
[13], Petroni and Serva discuss the use of Levenshtein distances (LD) between words referring to the same concepts as a tool for establishing overall distances among languages which can then subsequently be used to derive phylogenies. The authors modify the raw LD by dividing the LD by the length of the longer of the two words compared, to produce what could be called LDN (normalized LD). Other scholars
[7,8] have used a further modification, where they divide the LDN by the average LDN among words not referring to the same concept. This produces what could be called LDND. The authors of Ref.
[13] question whether LDND is a more adequate measure of distance than LDN. Here we show empirically that LDND is the better measure in the situation where the languages compared have not already been shown, by other, more traditional methods of comparative linguistics, to be related. If automated language classification is to be used as a tool independent of traditional methods then the further modification is necessary. |
doi_str_mv | 10.1016/j.physa.2010.05.011 |
format | article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_753682307</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0378437110003997</els_id><sourcerecordid>753682307</sourcerecordid><originalsourceid>FETCH-LOGICAL-c335t-1b1abec6405558d53de3902b561f6e89969c57c2030df9afa979a11936a882123</originalsourceid><addsrcrecordid>eNp9kDFPwzAQhS0EEqXwC1i6dUq4s2s7HhhQVQpSJRaYLde5gKu0KXZSqf8elzKz3JNO753ufYzdI5QIqB425f7rmFzJIW9AloB4wUZYaVFwRHPJRiB0VcyExmt2k9IGAFALPmLTxcG1g-vD7nPS5jGE1Ac_qbO4nafJllwaIqVbdtW4NtHdn47Zx_Piff5SrN6Wr_OnVeGFkH2Ba3Rr8moGUsqqlqImYYCvpcJGUWWMMl5qz0FA3RjXOKONyx8K5aqKIxdjNj3f3cfue6DU221IntrW7agbktVSqIoL0Nkpzk4fu5QiNXYfw9bFo0WwJyp2Y3-p2BMVC9JmKjn1eE5RLnEIFG3ygXLTOkTyva278G_-Bza6as0</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>753682307</pqid></control><display><type>article</type><title>Evaluating linguistic distance measures</title><source>ScienceDirect Journals</source><creator>Wichmann, Søren ; Holman, Eric W. ; Bakker, Dik ; Brown, Cecil H.</creator><creatorcontrib>Wichmann, Søren ; Holman, Eric W. ; Bakker, Dik ; Brown, Cecil H.</creatorcontrib><description>In Ref.
[13], Petroni and Serva discuss the use of Levenshtein distances (LD) between words referring to the same concepts as a tool for establishing overall distances among languages which can then subsequently be used to derive phylogenies. The authors modify the raw LD by dividing the LD by the length of the longer of the two words compared, to produce what could be called LDN (normalized LD). Other scholars
[7,8] have used a further modification, where they divide the LDN by the average LDN among words not referring to the same concept. This produces what could be called LDND. The authors of Ref.
[13] question whether LDND is a more adequate measure of distance than LDN. Here we show empirically that LDND is the better measure in the situation where the languages compared have not already been shown, by other, more traditional methods of comparative linguistics, to be related. If automated language classification is to be used as a tool independent of traditional methods then the further modification is necessary.</description><identifier>ISSN: 0378-4371</identifier><identifier>EISSN: 1873-2119</identifier><identifier>DOI: 10.1016/j.physa.2010.05.011</identifier><language>eng</language><publisher>Elsevier B.V</publisher><subject>ASJP ; Automated ; Classification ; Empirical analysis ; Historical linguistics ; Levenshtein distance ; Linguistics ; Phylogenetics ; Raw ; Statistical mechanics</subject><ispartof>Physica A, 2010-09, Vol.389 (17), p.3632-3639</ispartof><rights>2010 Elsevier B.V.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c335t-1b1abec6405558d53de3902b561f6e89969c57c2030df9afa979a11936a882123</citedby><cites>FETCH-LOGICAL-c335t-1b1abec6405558d53de3902b561f6e89969c57c2030df9afa979a11936a882123</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids></links><search><creatorcontrib>Wichmann, Søren</creatorcontrib><creatorcontrib>Holman, Eric W.</creatorcontrib><creatorcontrib>Bakker, Dik</creatorcontrib><creatorcontrib>Brown, Cecil H.</creatorcontrib><title>Evaluating linguistic distance measures</title><title>Physica A</title><description>In Ref.
[13], Petroni and Serva discuss the use of Levenshtein distances (LD) between words referring to the same concepts as a tool for establishing overall distances among languages which can then subsequently be used to derive phylogenies. The authors modify the raw LD by dividing the LD by the length of the longer of the two words compared, to produce what could be called LDN (normalized LD). Other scholars
[7,8] have used a further modification, where they divide the LDN by the average LDN among words not referring to the same concept. This produces what could be called LDND. The authors of Ref.
[13] question whether LDND is a more adequate measure of distance than LDN. Here we show empirically that LDND is the better measure in the situation where the languages compared have not already been shown, by other, more traditional methods of comparative linguistics, to be related. If automated language classification is to be used as a tool independent of traditional methods then the further modification is necessary.</description><subject>ASJP</subject><subject>Automated</subject><subject>Classification</subject><subject>Empirical analysis</subject><subject>Historical linguistics</subject><subject>Levenshtein distance</subject><subject>Linguistics</subject><subject>Phylogenetics</subject><subject>Raw</subject><subject>Statistical mechanics</subject><issn>0378-4371</issn><issn>1873-2119</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2010</creationdate><recordtype>article</recordtype><recordid>eNp9kDFPwzAQhS0EEqXwC1i6dUq4s2s7HhhQVQpSJRaYLde5gKu0KXZSqf8elzKz3JNO753ufYzdI5QIqB425f7rmFzJIW9AloB4wUZYaVFwRHPJRiB0VcyExmt2k9IGAFALPmLTxcG1g-vD7nPS5jGE1Ac_qbO4nafJllwaIqVbdtW4NtHdn47Zx_Piff5SrN6Wr_OnVeGFkH2Ba3Rr8moGUsqqlqImYYCvpcJGUWWMMl5qz0FA3RjXOKONyx8K5aqKIxdjNj3f3cfue6DU221IntrW7agbktVSqIoL0Nkpzk4fu5QiNXYfw9bFo0WwJyp2Y3-p2BMVC9JmKjn1eE5RLnEIFG3ygXLTOkTyva278G_-Bza6as0</recordid><startdate>20100901</startdate><enddate>20100901</enddate><creator>Wichmann, Søren</creator><creator>Holman, Eric W.</creator><creator>Bakker, Dik</creator><creator>Brown, Cecil H.</creator><general>Elsevier B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7U5</scope><scope>8FD</scope><scope>H8D</scope><scope>L7M</scope></search><sort><creationdate>20100901</creationdate><title>Evaluating linguistic distance measures</title><author>Wichmann, Søren ; Holman, Eric W. ; Bakker, Dik ; Brown, Cecil H.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c335t-1b1abec6405558d53de3902b561f6e89969c57c2030df9afa979a11936a882123</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2010</creationdate><topic>ASJP</topic><topic>Automated</topic><topic>Classification</topic><topic>Empirical analysis</topic><topic>Historical linguistics</topic><topic>Levenshtein distance</topic><topic>Linguistics</topic><topic>Phylogenetics</topic><topic>Raw</topic><topic>Statistical mechanics</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Wichmann, Søren</creatorcontrib><creatorcontrib>Holman, Eric W.</creatorcontrib><creatorcontrib>Bakker, Dik</creatorcontrib><creatorcontrib>Brown, Cecil H.</creatorcontrib><collection>CrossRef</collection><collection>Solid State and Superconductivity Abstracts</collection><collection>Technology Research Database</collection><collection>Aerospace Database</collection><collection>Advanced Technologies Database with Aerospace</collection><jtitle>Physica A</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Wichmann, Søren</au><au>Holman, Eric W.</au><au>Bakker, Dik</au><au>Brown, Cecil H.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Evaluating linguistic distance measures</atitle><jtitle>Physica A</jtitle><date>2010-09-01</date><risdate>2010</risdate><volume>389</volume><issue>17</issue><spage>3632</spage><epage>3639</epage><pages>3632-3639</pages><issn>0378-4371</issn><eissn>1873-2119</eissn><abstract>In Ref.
[13], Petroni and Serva discuss the use of Levenshtein distances (LD) between words referring to the same concepts as a tool for establishing overall distances among languages which can then subsequently be used to derive phylogenies. The authors modify the raw LD by dividing the LD by the length of the longer of the two words compared, to produce what could be called LDN (normalized LD). Other scholars
[7,8] have used a further modification, where they divide the LDN by the average LDN among words not referring to the same concept. This produces what could be called LDND. The authors of Ref.
[13] question whether LDND is a more adequate measure of distance than LDN. Here we show empirically that LDND is the better measure in the situation where the languages compared have not already been shown, by other, more traditional methods of comparative linguistics, to be related. If automated language classification is to be used as a tool independent of traditional methods then the further modification is necessary.</abstract><pub>Elsevier B.V</pub><doi>10.1016/j.physa.2010.05.011</doi><tpages>8</tpages></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0378-4371 |
ispartof | Physica A, 2010-09, Vol.389 (17), p.3632-3639 |
issn | 0378-4371 1873-2119 |
language | eng |
recordid | cdi_proquest_miscellaneous_753682307 |
source | ScienceDirect Journals |
subjects | ASJP Automated Classification Empirical analysis Historical linguistics Levenshtein distance Linguistics Phylogenetics Raw Statistical mechanics |
title | Evaluating linguistic distance measures |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-02T09%3A08%3A29IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Evaluating%20linguistic%20distance%20measures&rft.jtitle=Physica%20A&rft.au=Wichmann,%20S%C3%B8ren&rft.date=2010-09-01&rft.volume=389&rft.issue=17&rft.spage=3632&rft.epage=3639&rft.pages=3632-3639&rft.issn=0378-4371&rft.eissn=1873-2119&rft_id=info:doi/10.1016/j.physa.2010.05.011&rft_dat=%3Cproquest_cross%3E753682307%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c335t-1b1abec6405558d53de3902b561f6e89969c57c2030df9afa979a11936a882123%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=753682307&rft_id=info:pmid/&rfr_iscdi=true |