Loading…

On learning sparse linear models from cross samples

The sample complexity of a sparse linear model where samples are dependent is studied in this paper. We consider a specific dependency structure of the samples which arises in some experimental designs such as drug sensitivity studies, where two sets of objects (drugs and cells) are sampled independ...

Full description

Saved in:

Bibliographic Details
Published in:	Signal processing 2025-02, Vol.227, p.109680, Article 109680
Main Authors:	Mahmoudi, Mina Sadat, Motahari, Seyed Abolfazl, Khalaj, Babak
Format:	Article
Language:	English
Subjects:	Cancer cell drug response Cross samples Dependent data Gaussian mixture model Lasso estimator Sparse linear models
Citations:	Items that this one cites
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	The sample complexity of a sparse linear model where samples are dependent is studied in this paper. We consider a specific dependency structure of the samples which arises in some experimental designs such as drug sensitivity studies, where two sets of objects (drugs and cells) are sampled independently, and after crossing (making all possible combinations of drugs and cells), the resulting output (efficacy of drugs) is measured. We call these types of samples as “cross samples”. The dependency among such samples is strong, and existing theoretical studies are either inapplicable or fail to provide realistic bounds. We aim at analyzing the performance of the Lasso estimator where the underlying distributions are mixtures of Gaussians and the data dependency arises from the crossing procedure. Our theoretical results show that the performance of the Lasso estimator in case of cross samples follows that of the i.i.d. samples with differences in constant factors. Through numerical results, we observe a phase transition: When datasets are too small, the error for cross samples is much larger than for i.i.d. samples, but once the size is large enough, cross samples are nearly as useful as i.i.d. samples. Our theoretical analysis suggests that the transition threshold is governed by the level of sparsity of the true parameter vector being estimated. •A specific dependency among samples has been theoretically and numerically studied.•This dependency occurs in the study of cancer drug response and movie recommendation.•Lasso estimation error using such samples shows a phase-transition behavior.•The transition threshold seems to be governed by sparsity level of the true parameter vector.•If the sample size is large enough, dependent samples are as useful as i.i.d. samples.
ISSN:	0165-1684
DOI:	10.1016/j.sigpro.2024.109680