Loading…

Similarity in languages and programs

We use an information-theoretic notion, namely, (Shannon) information rate, to generalize common syntactic similarity metrics (like Hamming distance and longest common subsequences) between strings to ones between languages. We show that the similarity metrics between two regular languages are compu...

Full description

Saved in:
Bibliographic Details
Published in:Theoretical computer science 2013-08, Vol.498, p.58-75
Main Authors: Cui, Cewei, Dang, Zhe, Fischer, Thomas R., Ibarra, Oscar H.
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:We use an information-theoretic notion, namely, (Shannon) information rate, to generalize common syntactic similarity metrics (like Hamming distance and longest common subsequences) between strings to ones between languages. We show that the similarity metrics between two regular languages are computable. We further study self-similarity of a regular language under various similarity metrics. As far as semantic similarity is concerned, we study the amplitude of an automaton, which intuitively characterizes how much a typical execution of the automaton fluctuates. Finally, we investigate, through experiments, how to measure similarity between two real-world programs using Lempel–Ziv compression on the runs at the assembly level.
ISSN:0304-3975
1879-2294
DOI:10.1016/j.tcs.2013.05.040