Loading…

Fitting Distances by Tree Metrics Minimizing the Total Error within a Constant Factor

We consider the numerical taxonomy problem of fitting a positive distance function \({\mathcal {D}:{S\choose 2}\rightarrow \mathbb {R}_{\gt 0}}\) by a tree metric. We want a tree T with positive edge weights and including S among the vertices so that their distances in T match those in \(\mathcal {D...

Full description

Saved in:

Bibliographic Details
Published in:	Journal of the ACM 2024-04, Vol.71 (2), p.1-41, Article 10
Main Authors:	Cohen-Addad, Vincent, Das, Debarati, Kipouridis, Evangelos, Parotsidis, Nikos, Thorup, Mikkel
Format:	Article
Language:	English
Subjects:	Algorithms Apexes Approximation Approximation algorithms analysis Errors Graph theory Mathematical analysis Polynomials Taxonomy Theory of computation
Citations:	Items that this one cites
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	We consider the numerical taxonomy problem of fitting a positive distance function \({\mathcal {D}:{S\choose 2}\rightarrow \mathbb {R}_{\gt 0}}\) by a tree metric. We want a tree T with positive edge weights and including S among the vertices so that their distances in T match those in \(\mathcal {D}\) . A nice application is in evolutionary biology where the tree T aims to approximate thebranching process leading to the observed distances in \(\mathcal {D}\) [Cavalli-Sforza and Edwards 1967]. We consider the total error, that is, the sum of distance errors over all pairs of points. We present a deterministic polynomial time algorithm minimizing the total error within a constant factor. We can do this both for general trees and for the special case of ultrametrics with a root having the same distance to all vertices in S.The problems are APX-hard, so a constant factor is the best we can hope for in polynomial time. The best previous approximation factor was O((log n)(log log n)) by Ailon and Charikar [2005], who wrote “determining whether an O(1) approximation can be obtained is a fascinating question.”
ISSN:	0004-5411 1557-735X
DOI:	10.1145/3639453