Loading…
On learning sparse linear models from cross samples
The sample complexity of a sparse linear model where samples are dependent is studied in this paper. We consider a specific dependency structure of the samples which arises in some experimental designs such as drug sensitivity studies, where two sets of objects (drugs and cells) are sampled independ...
Saved in:
Published in: | Signal processing 2025-02, Vol.227, p.109680, Article 109680 |
---|---|
Main Authors: | , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | The sample complexity of a sparse linear model where samples are dependent is studied in this paper. We consider a specific dependency structure of the samples which arises in some experimental designs such as drug sensitivity studies, where two sets of objects (drugs and cells) are sampled independently, and after crossing (making all possible combinations of drugs and cells), the resulting output (efficacy of drugs) is measured. We call these types of samples as “cross samples”. The dependency among such samples is strong, and existing theoretical studies are either inapplicable or fail to provide realistic bounds. We aim at analyzing the performance of the Lasso estimator where the underlying distributions are mixtures of Gaussians and the data dependency arises from the crossing procedure. Our theoretical results show that the performance of the Lasso estimator in case of cross samples follows that of the i.i.d. samples with differences in constant factors. Through numerical results, we observe a phase transition: When datasets are too small, the error for cross samples is much larger than for i.i.d. samples, but once the size is large enough, cross samples are nearly as useful as i.i.d. samples. Our theoretical analysis suggests that the transition threshold is governed by the level of sparsity of the true parameter vector being estimated.
•A specific dependency among samples has been theoretically and numerically studied.•This dependency occurs in the study of cancer drug response and movie recommendation.•Lasso estimation error using such samples shows a phase-transition behavior.•The transition threshold seems to be governed by sparsity level of the true parameter vector.•If the sample size is large enough, dependent samples are as useful as i.i.d. samples. |
---|---|
ISSN: | 0165-1684 |
DOI: | 10.1016/j.sigpro.2024.109680 |