Loading…

Simulated Data for Linear Regression with Structured and Sparse Penalties: Introducing pylearn-simulate

A currently very active field of research is how to incorporate structure and prior knowledge in machine learning methods. It has lead to numerous developments in the field of non-smooth convex minimization. With recently developed methods it is possible to perform an analysis in which the computed...

Full description

Saved in:
Bibliographic Details
Published in:Journal of statistical software 2018-10, Vol.87 (3), p.1-33
Main Authors: Löfstedt, Tommy, Guillemot, Vincent, Frouin, Vincent, Duchesnay, Edouard, Hadj-Selem, Fouad
Format: Article
Language:English
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites
container_end_page 33
container_issue 3
container_start_page 1
container_title Journal of statistical software
container_volume 87
creator Löfstedt, Tommy
Guillemot, Vincent
Frouin, Vincent
Duchesnay, Edouard
Hadj-Selem, Fouad
description A currently very active field of research is how to incorporate structure and prior knowledge in machine learning methods. It has lead to numerous developments in the field of non-smooth convex minimization. With recently developed methods it is possible to perform an analysis in which the computed model can be linked to a given structure of the data and simultaneously do variable selection to find a few important features in the data. However, there is still no way to unambiguously simulate data to test proposed algorithms, since the exact solutions to such problems are unknown. The main aim of this paper is to present a theoretical framework for generating simulated data. These simulated data are appropriate when comparing optimization algorithms in the context of linear regression problems with sparse and structured penalties. Additionally, this approach allows the user to control the signal-to-noise ratio, the correlation structure of the data and the optimization problem to which they are the solution. The traditional approach is to simulate random data without taking into account the actual model that will be fit to the data. But when using such an approach it is not possible to know the exact solution of the underlying optimization problem. With our contribution, it is possible to know the exact theoretical solution of a penalized linear regression problem, and it is thus possible to compare algorithms without the need to use, e.g., cross-validation. We also present our implementation, the Python package pylearn-simulate , available at https://github.com/neurospin/pylearn-simulate and released under the BSD 3clause license. We describe the package and give examples at the end of the paper.
doi_str_mv 10.18637/jss.v087.i03
format article
fullrecord <record><control><sourceid>swepub_doaj_</sourceid><recordid>TN_cdi_doaj_primary_oai_doaj_org_article_384836713f404b7588f49410a45cf05b</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><doaj_id>oai_doaj_org_article_384836713f404b7588f49410a45cf05b</doaj_id><sourcerecordid>oai_DiVA_org_umu_153002</sourcerecordid><originalsourceid>FETCH-LOGICAL-c2122-6f2d6251f3c2bd380df0321490f86a7d43c9e52c091b98d8e6fc00ea2fe58fa33</originalsourceid><addsrcrecordid>eNpVkctOwzAQRSMEEqWwZO8fSBm_Eodd1fKoVAlEga3l-FFcpUllJ1T9e0Jb8VjNaHTvWcxJkmsMIywymt-sYhx9gshHHuhJMsCciTTPMjj9s58nFzGuAAiwgg-S5cKvu0q11qCpahVyTUBzX1sV0ItdBhujb2q09e0HWrSh020X-qiqDVpsVIgWPdtaVa238RbN6jY0ptO-XqLNruoZdRqP-MvkzKkq2qvjHCZv93evk8d0_vQwm4znqSaYkDRzxGSEY0c1KQ0VYBxQglkBTmQqN4zqwnKiocBlIYywmdMAVhFnuXCK0mEyO3BNo1ZyE_xahZ1slJf7QxOWUoXW68pKKpigWY6pY8DKnAvhWMEwKMa1A172rPTAilu76cp_tKl_H-9p3bqTmNP-ob95HZoYg3U_DQxyb0j2huS3Idkbol8x8IXw</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Simulated Data for Linear Regression with Structured and Sparse Penalties: Introducing pylearn-simulate</title><source>DOAJ Directory of Open Access Journals</source><creator>Löfstedt, Tommy ; Guillemot, Vincent ; Frouin, Vincent ; Duchesnay, Edouard ; Hadj-Selem, Fouad</creator><creatorcontrib>Löfstedt, Tommy ; Guillemot, Vincent ; Frouin, Vincent ; Duchesnay, Edouard ; Hadj-Selem, Fouad</creatorcontrib><description>A currently very active field of research is how to incorporate structure and prior knowledge in machine learning methods. It has lead to numerous developments in the field of non-smooth convex minimization. With recently developed methods it is possible to perform an analysis in which the computed model can be linked to a given structure of the data and simultaneously do variable selection to find a few important features in the data. However, there is still no way to unambiguously simulate data to test proposed algorithms, since the exact solutions to such problems are unknown. The main aim of this paper is to present a theoretical framework for generating simulated data. These simulated data are appropriate when comparing optimization algorithms in the context of linear regression problems with sparse and structured penalties. Additionally, this approach allows the user to control the signal-to-noise ratio, the correlation structure of the data and the optimization problem to which they are the solution. The traditional approach is to simulate random data without taking into account the actual model that will be fit to the data. But when using such an approach it is not possible to know the exact solution of the underlying optimization problem. With our contribution, it is possible to know the exact theoretical solution of a penalized linear regression problem, and it is thus possible to compare algorithms without the need to use, e.g., cross-validation. We also present our implementation, the Python package pylearn-simulate , available at https://github.com/neurospin/pylearn-simulate and released under the BSD 3clause license. We describe the package and give examples at the end of the paper.</description><identifier>ISSN: 1548-7660</identifier><identifier>EISSN: 1548-7660</identifier><identifier>DOI: 10.18637/jss.v087.i03</identifier><language>eng</language><publisher>Foundation for Open Access Statistics</publisher><subject>linear regression ; matematik ; matematisk statistik ; Mathematical Statistics ; Mathematics ; Python ; simulated data ; sparse and structured penalties</subject><ispartof>Journal of statistical software, 2018-10, Vol.87 (3), p.1-33</ispartof><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>230,314,780,784,864,885,2102,27924,27925</link.rule.ids><backlink>$$Uhttps://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-153002$$DView record from Swedish Publication Index$$Hfree_for_read</backlink></links><search><creatorcontrib>Löfstedt, Tommy</creatorcontrib><creatorcontrib>Guillemot, Vincent</creatorcontrib><creatorcontrib>Frouin, Vincent</creatorcontrib><creatorcontrib>Duchesnay, Edouard</creatorcontrib><creatorcontrib>Hadj-Selem, Fouad</creatorcontrib><title>Simulated Data for Linear Regression with Structured and Sparse Penalties: Introducing pylearn-simulate</title><title>Journal of statistical software</title><description>A currently very active field of research is how to incorporate structure and prior knowledge in machine learning methods. It has lead to numerous developments in the field of non-smooth convex minimization. With recently developed methods it is possible to perform an analysis in which the computed model can be linked to a given structure of the data and simultaneously do variable selection to find a few important features in the data. However, there is still no way to unambiguously simulate data to test proposed algorithms, since the exact solutions to such problems are unknown. The main aim of this paper is to present a theoretical framework for generating simulated data. These simulated data are appropriate when comparing optimization algorithms in the context of linear regression problems with sparse and structured penalties. Additionally, this approach allows the user to control the signal-to-noise ratio, the correlation structure of the data and the optimization problem to which they are the solution. The traditional approach is to simulate random data without taking into account the actual model that will be fit to the data. But when using such an approach it is not possible to know the exact solution of the underlying optimization problem. With our contribution, it is possible to know the exact theoretical solution of a penalized linear regression problem, and it is thus possible to compare algorithms without the need to use, e.g., cross-validation. We also present our implementation, the Python package pylearn-simulate , available at https://github.com/neurospin/pylearn-simulate and released under the BSD 3clause license. We describe the package and give examples at the end of the paper.</description><subject>linear regression</subject><subject>matematik</subject><subject>matematisk statistik</subject><subject>Mathematical Statistics</subject><subject>Mathematics</subject><subject>Python</subject><subject>simulated data</subject><subject>sparse and structured penalties</subject><issn>1548-7660</issn><issn>1548-7660</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><sourceid>DOA</sourceid><recordid>eNpVkctOwzAQRSMEEqWwZO8fSBm_Eodd1fKoVAlEga3l-FFcpUllJ1T9e0Jb8VjNaHTvWcxJkmsMIywymt-sYhx9gshHHuhJMsCciTTPMjj9s58nFzGuAAiwgg-S5cKvu0q11qCpahVyTUBzX1sV0ItdBhujb2q09e0HWrSh020X-qiqDVpsVIgWPdtaVa238RbN6jY0ptO-XqLNruoZdRqP-MvkzKkq2qvjHCZv93evk8d0_vQwm4znqSaYkDRzxGSEY0c1KQ0VYBxQglkBTmQqN4zqwnKiocBlIYywmdMAVhFnuXCK0mEyO3BNo1ZyE_xahZ1slJf7QxOWUoXW68pKKpigWY6pY8DKnAvhWMEwKMa1A172rPTAilu76cp_tKl_H-9p3bqTmNP-ob95HZoYg3U_DQxyb0j2huS3Idkbol8x8IXw</recordid><startdate>20181001</startdate><enddate>20181001</enddate><creator>Löfstedt, Tommy</creator><creator>Guillemot, Vincent</creator><creator>Frouin, Vincent</creator><creator>Duchesnay, Edouard</creator><creator>Hadj-Selem, Fouad</creator><general>Foundation for Open Access Statistics</general><scope>AAYXX</scope><scope>CITATION</scope><scope>ADTPV</scope><scope>AOWAS</scope><scope>D93</scope><scope>DOA</scope></search><sort><creationdate>20181001</creationdate><title>Simulated Data for Linear Regression with Structured and Sparse Penalties: Introducing pylearn-simulate</title><author>Löfstedt, Tommy ; Guillemot, Vincent ; Frouin, Vincent ; Duchesnay, Edouard ; Hadj-Selem, Fouad</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c2122-6f2d6251f3c2bd380df0321490f86a7d43c9e52c091b98d8e6fc00ea2fe58fa33</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2018</creationdate><topic>linear regression</topic><topic>matematik</topic><topic>matematisk statistik</topic><topic>Mathematical Statistics</topic><topic>Mathematics</topic><topic>Python</topic><topic>simulated data</topic><topic>sparse and structured penalties</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Löfstedt, Tommy</creatorcontrib><creatorcontrib>Guillemot, Vincent</creatorcontrib><creatorcontrib>Frouin, Vincent</creatorcontrib><creatorcontrib>Duchesnay, Edouard</creatorcontrib><creatorcontrib>Hadj-Selem, Fouad</creatorcontrib><collection>CrossRef</collection><collection>SwePub</collection><collection>SwePub Articles</collection><collection>SWEPUB Umeå universitet</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>Journal of statistical software</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Löfstedt, Tommy</au><au>Guillemot, Vincent</au><au>Frouin, Vincent</au><au>Duchesnay, Edouard</au><au>Hadj-Selem, Fouad</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Simulated Data for Linear Regression with Structured and Sparse Penalties: Introducing pylearn-simulate</atitle><jtitle>Journal of statistical software</jtitle><date>2018-10-01</date><risdate>2018</risdate><volume>87</volume><issue>3</issue><spage>1</spage><epage>33</epage><pages>1-33</pages><issn>1548-7660</issn><eissn>1548-7660</eissn><abstract>A currently very active field of research is how to incorporate structure and prior knowledge in machine learning methods. It has lead to numerous developments in the field of non-smooth convex minimization. With recently developed methods it is possible to perform an analysis in which the computed model can be linked to a given structure of the data and simultaneously do variable selection to find a few important features in the data. However, there is still no way to unambiguously simulate data to test proposed algorithms, since the exact solutions to such problems are unknown. The main aim of this paper is to present a theoretical framework for generating simulated data. These simulated data are appropriate when comparing optimization algorithms in the context of linear regression problems with sparse and structured penalties. Additionally, this approach allows the user to control the signal-to-noise ratio, the correlation structure of the data and the optimization problem to which they are the solution. The traditional approach is to simulate random data without taking into account the actual model that will be fit to the data. But when using such an approach it is not possible to know the exact solution of the underlying optimization problem. With our contribution, it is possible to know the exact theoretical solution of a penalized linear regression problem, and it is thus possible to compare algorithms without the need to use, e.g., cross-validation. We also present our implementation, the Python package pylearn-simulate , available at https://github.com/neurospin/pylearn-simulate and released under the BSD 3clause license. We describe the package and give examples at the end of the paper.</abstract><pub>Foundation for Open Access Statistics</pub><doi>10.18637/jss.v087.i03</doi><tpages>33</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1548-7660
ispartof Journal of statistical software, 2018-10, Vol.87 (3), p.1-33
issn 1548-7660
1548-7660
language eng
recordid cdi_doaj_primary_oai_doaj_org_article_384836713f404b7588f49410a45cf05b
source DOAJ Directory of Open Access Journals
subjects linear regression
matematik
matematisk statistik
Mathematical Statistics
Mathematics
Python
simulated data
sparse and structured penalties
title Simulated Data for Linear Regression with Structured and Sparse Penalties: Introducing pylearn-simulate
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-07T14%3A39%3A16IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-swepub_doaj_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Simulated%20Data%20for%20Linear%20Regression%20with%20Structured%20and%20Sparse%20Penalties:%20Introducing%20pylearn-simulate&rft.jtitle=Journal%20of%20statistical%20software&rft.au=L%C3%B6fstedt,%20Tommy&rft.date=2018-10-01&rft.volume=87&rft.issue=3&rft.spage=1&rft.epage=33&rft.pages=1-33&rft.issn=1548-7660&rft.eissn=1548-7660&rft_id=info:doi/10.18637/jss.v087.i03&rft_dat=%3Cswepub_doaj_%3Eoai_DiVA_org_umu_153002%3C/swepub_doaj_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c2122-6f2d6251f3c2bd380df0321490f86a7d43c9e52c091b98d8e6fc00ea2fe58fa33%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true