Loading…

Local Rank Distance

Researchers have developed a wide variety of methods for string data, that can be applied with success in different fields such as computational biology, natural language processing and so on. Such methods range from clustering techniques used to analyze the phylogenetic trees of different organisms...

Full description

Saved in:
Bibliographic Details
Main Author: Ionescu, Radu Tudor
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Researchers have developed a wide variety of methods for string data, that can be applied with success in different fields such as computational biology, natural language processing and so on. Such methods range from clustering techniques used to analyze the phylogenetic trees of different organisms, to kernel methods used to identify authorship or native language from text. Results of such methods are not perfect and can always be improved. Some of these methods are based on a distance or similarity measure for strings, such as Hamming, Levenshtein, Kendall-tau, rank distance, or string kernel. This paper aims to introduce a new distance measure, termed Local Rank Distance (LRD), inspired from the recently introduced Local Patch Dissimilarity for images. Designed to conform to more general principles and adapted to DNA strings, LRD comes to improve over state of the art methods for phylogenetic analysis. This paper shows two applications of LRD. The first application is the phylogenetic analysis of mammals. Experiments show that phylogenetic trees produced by LRD are better or at least similar to those reported in the literature. The second application is to identify native language of English learners. By working at character level, the proposed method is completely language independent and theory neutral. In conclusion, LRD can be used as a general approach to measure string similarity, despite being designed for DNA.
DOI:10.1109/SYNASC.2013.36