Loading…

Inference of compressed Potts graphical models

We consider the problem of inferring a graphical Potts model on a population of variables. This inverse Potts problem generally involves the inference of a large number of parameters, often larger than the number of available data, and, hence, requires the introduction of regularization. We study he...

Full description

Saved in:
Bibliographic Details
Published in:Physical review. E 2020-01, Vol.101 (1-1), p.012309-012309, Article 012309
Main Authors: Rizzato, Francesca, Coucke, Alice, de Leonardis, Eleonora, Barton, John P, Tubiana, Jérôme, Monasson, Rémi, Cocco, Simona
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:We consider the problem of inferring a graphical Potts model on a population of variables. This inverse Potts problem generally involves the inference of a large number of parameters, often larger than the number of available data, and, hence, requires the introduction of regularization. We study here a double regularization scheme, in which the number of Potts states (colors) available to each variable is reduced and interaction networks are made sparse. To achieve the color compression, only Potts states with large empirical frequency (exceeding some threshold) are explicitly modeled on each site, while the others are grouped into a single state. We benchmark the performances of this mixed regularization approach, with two inference algorithms, adaptive cluster expansion (ACE) and pseudolikelihood maximization (PLM), on synthetic data obtained by sampling disordered Potts models on Erdős-Rényi random graphs. We show in particular that color compression does not affect the quality of reconstruction of the parameters corresponding to high-frequency symbols, while drastically reducing the number of the other parameters and thus the computational time. Our procedure is also applied to multisequence alignments of protein families, with similar results.
ISSN:2470-0045
2470-0053
DOI:10.1103/PhysRevE.101.012309