Loading…

Evaluating linguistic distance measures

In Ref.  [13], Petroni and Serva discuss the use of Levenshtein distances (LD) between words referring to the same concepts as a tool for establishing overall distances among languages which can then subsequently be used to derive phylogenies. The authors modify the raw LD by dividing the LD by the...

Full description

Saved in:
Bibliographic Details
Published in:Physica A 2010-09, Vol.389 (17), p.3632-3639
Main Authors: Wichmann, Søren, Holman, Eric W., Bakker, Dik, Brown, Cecil H.
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c335t-1b1abec6405558d53de3902b561f6e89969c57c2030df9afa979a11936a882123
cites cdi_FETCH-LOGICAL-c335t-1b1abec6405558d53de3902b561f6e89969c57c2030df9afa979a11936a882123
container_end_page 3639
container_issue 17
container_start_page 3632
container_title Physica A
container_volume 389
creator Wichmann, Søren
Holman, Eric W.
Bakker, Dik
Brown, Cecil H.
description In Ref.  [13], Petroni and Serva discuss the use of Levenshtein distances (LD) between words referring to the same concepts as a tool for establishing overall distances among languages which can then subsequently be used to derive phylogenies. The authors modify the raw LD by dividing the LD by the length of the longer of the two words compared, to produce what could be called LDN (normalized LD). Other scholars  [7,8] have used a further modification, where they divide the LDN by the average LDN among words not referring to the same concept. This produces what could be called LDND. The authors of Ref.  [13] question whether LDND is a more adequate measure of distance than LDN. Here we show empirically that LDND is the better measure in the situation where the languages compared have not already been shown, by other, more traditional methods of comparative linguistics, to be related. If automated language classification is to be used as a tool independent of traditional methods then the further modification is necessary.
doi_str_mv 10.1016/j.physa.2010.05.011
format article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_753682307</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0378437110003997</els_id><sourcerecordid>753682307</sourcerecordid><originalsourceid>FETCH-LOGICAL-c335t-1b1abec6405558d53de3902b561f6e89969c57c2030df9afa979a11936a882123</originalsourceid><addsrcrecordid>eNp9kDFPwzAQhS0EEqXwC1i6dUq4s2s7HhhQVQpSJRaYLde5gKu0KXZSqf8elzKz3JNO753ufYzdI5QIqB425f7rmFzJIW9AloB4wUZYaVFwRHPJRiB0VcyExmt2k9IGAFALPmLTxcG1g-vD7nPS5jGE1Ac_qbO4nafJllwaIqVbdtW4NtHdn47Zx_Piff5SrN6Wr_OnVeGFkH2Ba3Rr8moGUsqqlqImYYCvpcJGUWWMMl5qz0FA3RjXOKONyx8K5aqKIxdjNj3f3cfue6DU221IntrW7agbktVSqIoL0Nkpzk4fu5QiNXYfw9bFo0WwJyp2Y3-p2BMVC9JmKjn1eE5RLnEIFG3ygXLTOkTyva278G_-Bza6as0</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>753682307</pqid></control><display><type>article</type><title>Evaluating linguistic distance measures</title><source>ScienceDirect Journals</source><creator>Wichmann, Søren ; Holman, Eric W. ; Bakker, Dik ; Brown, Cecil H.</creator><creatorcontrib>Wichmann, Søren ; Holman, Eric W. ; Bakker, Dik ; Brown, Cecil H.</creatorcontrib><description>In Ref.  [13], Petroni and Serva discuss the use of Levenshtein distances (LD) between words referring to the same concepts as a tool for establishing overall distances among languages which can then subsequently be used to derive phylogenies. The authors modify the raw LD by dividing the LD by the length of the longer of the two words compared, to produce what could be called LDN (normalized LD). Other scholars  [7,8] have used a further modification, where they divide the LDN by the average LDN among words not referring to the same concept. This produces what could be called LDND. The authors of Ref.  [13] question whether LDND is a more adequate measure of distance than LDN. Here we show empirically that LDND is the better measure in the situation where the languages compared have not already been shown, by other, more traditional methods of comparative linguistics, to be related. If automated language classification is to be used as a tool independent of traditional methods then the further modification is necessary.</description><identifier>ISSN: 0378-4371</identifier><identifier>EISSN: 1873-2119</identifier><identifier>DOI: 10.1016/j.physa.2010.05.011</identifier><language>eng</language><publisher>Elsevier B.V</publisher><subject>ASJP ; Automated ; Classification ; Empirical analysis ; Historical linguistics ; Levenshtein distance ; Linguistics ; Phylogenetics ; Raw ; Statistical mechanics</subject><ispartof>Physica A, 2010-09, Vol.389 (17), p.3632-3639</ispartof><rights>2010 Elsevier B.V.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c335t-1b1abec6405558d53de3902b561f6e89969c57c2030df9afa979a11936a882123</citedby><cites>FETCH-LOGICAL-c335t-1b1abec6405558d53de3902b561f6e89969c57c2030df9afa979a11936a882123</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids></links><search><creatorcontrib>Wichmann, Søren</creatorcontrib><creatorcontrib>Holman, Eric W.</creatorcontrib><creatorcontrib>Bakker, Dik</creatorcontrib><creatorcontrib>Brown, Cecil H.</creatorcontrib><title>Evaluating linguistic distance measures</title><title>Physica A</title><description>In Ref.  [13], Petroni and Serva discuss the use of Levenshtein distances (LD) between words referring to the same concepts as a tool for establishing overall distances among languages which can then subsequently be used to derive phylogenies. The authors modify the raw LD by dividing the LD by the length of the longer of the two words compared, to produce what could be called LDN (normalized LD). Other scholars  [7,8] have used a further modification, where they divide the LDN by the average LDN among words not referring to the same concept. This produces what could be called LDND. The authors of Ref.  [13] question whether LDND is a more adequate measure of distance than LDN. Here we show empirically that LDND is the better measure in the situation where the languages compared have not already been shown, by other, more traditional methods of comparative linguistics, to be related. If automated language classification is to be used as a tool independent of traditional methods then the further modification is necessary.</description><subject>ASJP</subject><subject>Automated</subject><subject>Classification</subject><subject>Empirical analysis</subject><subject>Historical linguistics</subject><subject>Levenshtein distance</subject><subject>Linguistics</subject><subject>Phylogenetics</subject><subject>Raw</subject><subject>Statistical mechanics</subject><issn>0378-4371</issn><issn>1873-2119</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2010</creationdate><recordtype>article</recordtype><recordid>eNp9kDFPwzAQhS0EEqXwC1i6dUq4s2s7HhhQVQpSJRaYLde5gKu0KXZSqf8elzKz3JNO753ufYzdI5QIqB425f7rmFzJIW9AloB4wUZYaVFwRHPJRiB0VcyExmt2k9IGAFALPmLTxcG1g-vD7nPS5jGE1Ac_qbO4nafJllwaIqVbdtW4NtHdn47Zx_Piff5SrN6Wr_OnVeGFkH2Ba3Rr8moGUsqqlqImYYCvpcJGUWWMMl5qz0FA3RjXOKONyx8K5aqKIxdjNj3f3cfue6DU221IntrW7agbktVSqIoL0Nkpzk4fu5QiNXYfw9bFo0WwJyp2Y3-p2BMVC9JmKjn1eE5RLnEIFG3ygXLTOkTyva278G_-Bza6as0</recordid><startdate>20100901</startdate><enddate>20100901</enddate><creator>Wichmann, Søren</creator><creator>Holman, Eric W.</creator><creator>Bakker, Dik</creator><creator>Brown, Cecil H.</creator><general>Elsevier B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7U5</scope><scope>8FD</scope><scope>H8D</scope><scope>L7M</scope></search><sort><creationdate>20100901</creationdate><title>Evaluating linguistic distance measures</title><author>Wichmann, Søren ; Holman, Eric W. ; Bakker, Dik ; Brown, Cecil H.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c335t-1b1abec6405558d53de3902b561f6e89969c57c2030df9afa979a11936a882123</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2010</creationdate><topic>ASJP</topic><topic>Automated</topic><topic>Classification</topic><topic>Empirical analysis</topic><topic>Historical linguistics</topic><topic>Levenshtein distance</topic><topic>Linguistics</topic><topic>Phylogenetics</topic><topic>Raw</topic><topic>Statistical mechanics</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Wichmann, Søren</creatorcontrib><creatorcontrib>Holman, Eric W.</creatorcontrib><creatorcontrib>Bakker, Dik</creatorcontrib><creatorcontrib>Brown, Cecil H.</creatorcontrib><collection>CrossRef</collection><collection>Solid State and Superconductivity Abstracts</collection><collection>Technology Research Database</collection><collection>Aerospace Database</collection><collection>Advanced Technologies Database with Aerospace</collection><jtitle>Physica A</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Wichmann, Søren</au><au>Holman, Eric W.</au><au>Bakker, Dik</au><au>Brown, Cecil H.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Evaluating linguistic distance measures</atitle><jtitle>Physica A</jtitle><date>2010-09-01</date><risdate>2010</risdate><volume>389</volume><issue>17</issue><spage>3632</spage><epage>3639</epage><pages>3632-3639</pages><issn>0378-4371</issn><eissn>1873-2119</eissn><abstract>In Ref.  [13], Petroni and Serva discuss the use of Levenshtein distances (LD) between words referring to the same concepts as a tool for establishing overall distances among languages which can then subsequently be used to derive phylogenies. The authors modify the raw LD by dividing the LD by the length of the longer of the two words compared, to produce what could be called LDN (normalized LD). Other scholars  [7,8] have used a further modification, where they divide the LDN by the average LDN among words not referring to the same concept. This produces what could be called LDND. The authors of Ref.  [13] question whether LDND is a more adequate measure of distance than LDN. Here we show empirically that LDND is the better measure in the situation where the languages compared have not already been shown, by other, more traditional methods of comparative linguistics, to be related. If automated language classification is to be used as a tool independent of traditional methods then the further modification is necessary.</abstract><pub>Elsevier B.V</pub><doi>10.1016/j.physa.2010.05.011</doi><tpages>8</tpages></addata></record>
fulltext fulltext
identifier ISSN: 0378-4371
ispartof Physica A, 2010-09, Vol.389 (17), p.3632-3639
issn 0378-4371
1873-2119
language eng
recordid cdi_proquest_miscellaneous_753682307
source ScienceDirect Journals
subjects ASJP
Automated
Classification
Empirical analysis
Historical linguistics
Levenshtein distance
Linguistics
Phylogenetics
Raw
Statistical mechanics
title Evaluating linguistic distance measures
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-02T09%3A08%3A29IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Evaluating%20linguistic%20distance%20measures&rft.jtitle=Physica%20A&rft.au=Wichmann,%20S%C3%B8ren&rft.date=2010-09-01&rft.volume=389&rft.issue=17&rft.spage=3632&rft.epage=3639&rft.pages=3632-3639&rft.issn=0378-4371&rft.eissn=1873-2119&rft_id=info:doi/10.1016/j.physa.2010.05.011&rft_dat=%3Cproquest_cross%3E753682307%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c335t-1b1abec6405558d53de3902b561f6e89969c57c2030df9afa979a11936a882123%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=753682307&rft_id=info:pmid/&rfr_iscdi=true