Loading…

Seagull: lasso, group lasso and sparse-group lasso regularization for linear regression models via proximal gradient descent

Statistical analyses of biological problems in life sciences often lead to high-dimensional linear models. To solve the corresponding system of equations, penalization approaches are often the methods of choice. They are especially useful in case of multicollinearity, which appears if the number of...

Full description

Saved in:
Bibliographic Details
Published in:BMC bioinformatics 2020-09, Vol.21 (1), p.1-407, Article 407
Main Authors: Klosa, Jan, Simon, Noah, Westermark, Pål Olof, Liebscher, Volkmar, Wittenburg, Dörte
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c574t-8e8769ad983e27d487b75d36e6ca2b5708ead9a0420ea56d6acee529d8151f543
cites cdi_FETCH-LOGICAL-c574t-8e8769ad983e27d487b75d36e6ca2b5708ead9a0420ea56d6acee529d8151f543
container_end_page 407
container_issue 1
container_start_page 1
container_title BMC bioinformatics
container_volume 21
creator Klosa, Jan
Simon, Noah
Westermark, Pål Olof
Liebscher, Volkmar
Wittenburg, Dörte
description Statistical analyses of biological problems in life sciences often lead to high-dimensional linear models. To solve the corresponding system of equations, penalization approaches are often the methods of choice. They are especially useful in case of multicollinearity, which appears if the number of explanatory variables exceeds the number of observations or for some biological reason. Then, the model goodness of fit is penalized by some suitable function of interest. Prominent examples are the lasso, group lasso and sparse-group lasso. Here, we offer a fast and numerically cheap implementation of these operators via proximal gradient descent. The grid search for the penalty parameter is realized by warm starts. The step size between consecutive iterations is determined with backtracking line search. Finally, seagull -the R package presented here- produces complete regularization paths. Publicly available high-dimensional methylation data are used to compare seagull to the established R package SGL. The results of both packages enabled a precise prediction of biological age from DNA methylation status. But even though the results of seagull and SGL were very similar (R.sup.2 > 0.99), seagull computed the solution in a fraction of the time needed by SGL. Additionally, seagull enables the incorporation of weights for each penalized feature. The following operators for linear regression models are available in seagull: lasso, group lasso, sparse-group lasso and Integrative LASSO with Penalty Factors (IPF-lasso). Thus, seagull is a convenient envelope of lasso variants.
doi_str_mv 10.1186/s12859-020-03725-w
format article
fullrecord <record><control><sourceid>gale_doaj_</sourceid><recordid>TN_cdi_doaj_primary_oai_doaj_org_article_370df5c131c247cd9c7e534eb01fcdb0</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A636945445</galeid><doaj_id>oai_doaj_org_article_370df5c131c247cd9c7e534eb01fcdb0</doaj_id><sourcerecordid>A636945445</sourcerecordid><originalsourceid>FETCH-LOGICAL-c574t-8e8769ad983e27d487b75d36e6ca2b5708ead9a0420ea56d6acee529d8151f543</originalsourceid><addsrcrecordid>eNptkttu1DAQhiMEogd4Aa4icQNSU3yMEy4qVRWHlSohUbi2Zu1J8MobL3bSFsTD4zQVdBHyxdgz3_y2f01RvKDklNKmfpMoa2RbEUYqwhWT1c2j4pAKRStGiXz8YH9QHKW0IYSqhsinxQFnLedCqcPi1xVCP3n_tvSQUjgp-xim3XIoYbBl2kFMWD1MR8wdEN1PGF0Yyi7E0rsBIc6ViCnN2W2w6FN57aDcxXDrtuCzNliHw1haTCbHZ8WTDnzC5_fxuPj6_t2Xi4_V5acPq4vzy8pIJcaqwUbVLdi24ciUFY1aK2l5jbUBtpaKNJiLQAQjCLK2NRhEyVrbUEk7KfhxsVp0bYCN3sX8mPhDB3D6LhFiryGOznjUXBHbSUM5NUwoY1ujUHKBa0I7Y9cka50tWrtpvUU7fyOC3xPdrwzum-7DtVYiey7bLPDqXiCG7xOmUW9ddsN7GDBMSTMhuKSqreuMvvwH3YQpDtmqmRJEcMLYX6qH_AE3dCHfa2ZRfV7zuhVSCJmp0_9QeVncOhMG7FzO7zW83mvIzIi3Yw9TSnp19XmfZQtrYkgpYvfHD0r0PKt6mVWdZ1Xfzaq-4b8BIYnbzg</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2444043022</pqid></control><display><type>article</type><title>Seagull: lasso, group lasso and sparse-group lasso regularization for linear regression models via proximal gradient descent</title><source>Open Access: PubMed Central</source><source>Publicly Available Content Database</source><creator>Klosa, Jan ; Simon, Noah ; Westermark, Pål Olof ; Liebscher, Volkmar ; Wittenburg, Dörte</creator><creatorcontrib>Klosa, Jan ; Simon, Noah ; Westermark, Pål Olof ; Liebscher, Volkmar ; Wittenburg, Dörte</creatorcontrib><description>Statistical analyses of biological problems in life sciences often lead to high-dimensional linear models. To solve the corresponding system of equations, penalization approaches are often the methods of choice. They are especially useful in case of multicollinearity, which appears if the number of explanatory variables exceeds the number of observations or for some biological reason. Then, the model goodness of fit is penalized by some suitable function of interest. Prominent examples are the lasso, group lasso and sparse-group lasso. Here, we offer a fast and numerically cheap implementation of these operators via proximal gradient descent. The grid search for the penalty parameter is realized by warm starts. The step size between consecutive iterations is determined with backtracking line search. Finally, seagull -the R package presented here- produces complete regularization paths. Publicly available high-dimensional methylation data are used to compare seagull to the established R package SGL. The results of both packages enabled a precise prediction of biological age from DNA methylation status. But even though the results of seagull and SGL were very similar (R.sup.2 &gt; 0.99), seagull computed the solution in a fraction of the time needed by SGL. Additionally, seagull enables the incorporation of weights for each penalized feature. The following operators for linear regression models are available in seagull: lasso, group lasso, sparse-group lasso and Integrative LASSO with Penalty Factors (IPF-lasso). Thus, seagull is a convenient envelope of lasso variants.</description><identifier>ISSN: 1471-2105</identifier><identifier>EISSN: 1471-2105</identifier><identifier>DOI: 10.1186/s12859-020-03725-w</identifier><identifier>PMID: 32933477</identifier><language>eng</language><publisher>London: BioMed Central Ltd</publisher><subject>Accuracy ; Age ; Algorithms ; Analysis ; Biological models (mathematics) ; DNA methylation ; Epigenetics ; Fines &amp; penalties ; Genomes ; Goodness of fit ; Gulls ; High-dimensional data ; Machine learning ; Methylation ; Operators ; Optimization ; Parameter estimation ; R package ; Regression analysis ; Regression models ; Regularization ; Software ; Statistical analysis ; Variables</subject><ispartof>BMC bioinformatics, 2020-09, Vol.21 (1), p.1-407, Article 407</ispartof><rights>COPYRIGHT 2020 BioMed Central Ltd.</rights><rights>2020. This work is licensed under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>The Author(s) 2020</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c574t-8e8769ad983e27d487b75d36e6ca2b5708ead9a0420ea56d6acee529d8151f543</citedby><cites>FETCH-LOGICAL-c574t-8e8769ad983e27d487b75d36e6ca2b5708ead9a0420ea56d6acee529d8151f543</cites><orcidid>0000-0002-3639-2574</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC7493359/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.proquest.com/docview/2444043022?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>230,314,727,780,784,885,25753,27924,27925,37012,37013,44590,53791,53793</link.rule.ids></links><search><creatorcontrib>Klosa, Jan</creatorcontrib><creatorcontrib>Simon, Noah</creatorcontrib><creatorcontrib>Westermark, Pål Olof</creatorcontrib><creatorcontrib>Liebscher, Volkmar</creatorcontrib><creatorcontrib>Wittenburg, Dörte</creatorcontrib><title>Seagull: lasso, group lasso and sparse-group lasso regularization for linear regression models via proximal gradient descent</title><title>BMC bioinformatics</title><description>Statistical analyses of biological problems in life sciences often lead to high-dimensional linear models. To solve the corresponding system of equations, penalization approaches are often the methods of choice. They are especially useful in case of multicollinearity, which appears if the number of explanatory variables exceeds the number of observations or for some biological reason. Then, the model goodness of fit is penalized by some suitable function of interest. Prominent examples are the lasso, group lasso and sparse-group lasso. Here, we offer a fast and numerically cheap implementation of these operators via proximal gradient descent. The grid search for the penalty parameter is realized by warm starts. The step size between consecutive iterations is determined with backtracking line search. Finally, seagull -the R package presented here- produces complete regularization paths. Publicly available high-dimensional methylation data are used to compare seagull to the established R package SGL. The results of both packages enabled a precise prediction of biological age from DNA methylation status. But even though the results of seagull and SGL were very similar (R.sup.2 &gt; 0.99), seagull computed the solution in a fraction of the time needed by SGL. Additionally, seagull enables the incorporation of weights for each penalized feature. The following operators for linear regression models are available in seagull: lasso, group lasso, sparse-group lasso and Integrative LASSO with Penalty Factors (IPF-lasso). Thus, seagull is a convenient envelope of lasso variants.</description><subject>Accuracy</subject><subject>Age</subject><subject>Algorithms</subject><subject>Analysis</subject><subject>Biological models (mathematics)</subject><subject>DNA methylation</subject><subject>Epigenetics</subject><subject>Fines &amp; penalties</subject><subject>Genomes</subject><subject>Goodness of fit</subject><subject>Gulls</subject><subject>High-dimensional data</subject><subject>Machine learning</subject><subject>Methylation</subject><subject>Operators</subject><subject>Optimization</subject><subject>Parameter estimation</subject><subject>R package</subject><subject>Regression analysis</subject><subject>Regression models</subject><subject>Regularization</subject><subject>Software</subject><subject>Statistical analysis</subject><subject>Variables</subject><issn>1471-2105</issn><issn>1471-2105</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><sourceid>DOA</sourceid><recordid>eNptkttu1DAQhiMEogd4Aa4icQNSU3yMEy4qVRWHlSohUbi2Zu1J8MobL3bSFsTD4zQVdBHyxdgz3_y2f01RvKDklNKmfpMoa2RbEUYqwhWT1c2j4pAKRStGiXz8YH9QHKW0IYSqhsinxQFnLedCqcPi1xVCP3n_tvSQUjgp-xim3XIoYbBl2kFMWD1MR8wdEN1PGF0Yyi7E0rsBIc6ViCnN2W2w6FN57aDcxXDrtuCzNliHw1haTCbHZ8WTDnzC5_fxuPj6_t2Xi4_V5acPq4vzy8pIJcaqwUbVLdi24ciUFY1aK2l5jbUBtpaKNJiLQAQjCLK2NRhEyVrbUEk7KfhxsVp0bYCN3sX8mPhDB3D6LhFiryGOznjUXBHbSUM5NUwoY1ujUHKBa0I7Y9cka50tWrtpvUU7fyOC3xPdrwzum-7DtVYiey7bLPDqXiCG7xOmUW9ddsN7GDBMSTMhuKSqreuMvvwH3YQpDtmqmRJEcMLYX6qH_AE3dCHfa2ZRfV7zuhVSCJmp0_9QeVncOhMG7FzO7zW83mvIzIi3Yw9TSnp19XmfZQtrYkgpYvfHD0r0PKt6mVWdZ1Xfzaq-4b8BIYnbzg</recordid><startdate>20200915</startdate><enddate>20200915</enddate><creator>Klosa, Jan</creator><creator>Simon, Noah</creator><creator>Westermark, Pål Olof</creator><creator>Liebscher, Volkmar</creator><creator>Wittenburg, Dörte</creator><general>BioMed Central Ltd</general><general>BioMed Central</general><general>BMC</general><scope>AAYXX</scope><scope>CITATION</scope><scope>ISR</scope><scope>3V.</scope><scope>7QO</scope><scope>7SC</scope><scope>7X7</scope><scope>7XB</scope><scope>88E</scope><scope>8AL</scope><scope>8AO</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FH</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BBNVY</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>BHPHI</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FR3</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>K9.</scope><scope>L7M</scope><scope>LK8</scope><scope>L~C</scope><scope>L~D</scope><scope>M0N</scope><scope>M0S</scope><scope>M1P</scope><scope>M7P</scope><scope>P5Z</scope><scope>P62</scope><scope>P64</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>Q9U</scope><scope>7X8</scope><scope>5PM</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0002-3639-2574</orcidid></search><sort><creationdate>20200915</creationdate><title>Seagull: lasso, group lasso and sparse-group lasso regularization for linear regression models via proximal gradient descent</title><author>Klosa, Jan ; Simon, Noah ; Westermark, Pål Olof ; Liebscher, Volkmar ; Wittenburg, Dörte</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c574t-8e8769ad983e27d487b75d36e6ca2b5708ead9a0420ea56d6acee529d8151f543</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Accuracy</topic><topic>Age</topic><topic>Algorithms</topic><topic>Analysis</topic><topic>Biological models (mathematics)</topic><topic>DNA methylation</topic><topic>Epigenetics</topic><topic>Fines &amp; penalties</topic><topic>Genomes</topic><topic>Goodness of fit</topic><topic>Gulls</topic><topic>High-dimensional data</topic><topic>Machine learning</topic><topic>Methylation</topic><topic>Operators</topic><topic>Optimization</topic><topic>Parameter estimation</topic><topic>R package</topic><topic>Regression analysis</topic><topic>Regression models</topic><topic>Regularization</topic><topic>Software</topic><topic>Statistical analysis</topic><topic>Variables</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Klosa, Jan</creatorcontrib><creatorcontrib>Simon, Noah</creatorcontrib><creatorcontrib>Westermark, Pål Olof</creatorcontrib><creatorcontrib>Liebscher, Volkmar</creatorcontrib><creatorcontrib>Wittenburg, Dörte</creatorcontrib><collection>CrossRef</collection><collection>Gale In Context: Science</collection><collection>ProQuest Central (Corporate)</collection><collection>Biotechnology Research Abstracts</collection><collection>Computer and Information Systems Abstracts</collection><collection>ProQuest_Health &amp; Medical Collection</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Medical Database (Alumni Edition)</collection><collection>Computing Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>Biological Science Collection</collection><collection>AUTh Library subscriptions: ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>Engineering Research Database</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>ProQuest Health &amp; Medical Complete (Alumni)</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>ProQuest Biological Science Collection</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Computing Database</collection><collection>Health &amp; Medical Collection (Alumni Edition)</collection><collection>PML(ProQuest Medical Library)</collection><collection>Biological Science Database</collection><collection>Advanced Technologies &amp; Aerospace Database</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Collection</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central Basic</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>BMC bioinformatics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Klosa, Jan</au><au>Simon, Noah</au><au>Westermark, Pål Olof</au><au>Liebscher, Volkmar</au><au>Wittenburg, Dörte</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Seagull: lasso, group lasso and sparse-group lasso regularization for linear regression models via proximal gradient descent</atitle><jtitle>BMC bioinformatics</jtitle><date>2020-09-15</date><risdate>2020</risdate><volume>21</volume><issue>1</issue><spage>1</spage><epage>407</epage><pages>1-407</pages><artnum>407</artnum><issn>1471-2105</issn><eissn>1471-2105</eissn><abstract>Statistical analyses of biological problems in life sciences often lead to high-dimensional linear models. To solve the corresponding system of equations, penalization approaches are often the methods of choice. They are especially useful in case of multicollinearity, which appears if the number of explanatory variables exceeds the number of observations or for some biological reason. Then, the model goodness of fit is penalized by some suitable function of interest. Prominent examples are the lasso, group lasso and sparse-group lasso. Here, we offer a fast and numerically cheap implementation of these operators via proximal gradient descent. The grid search for the penalty parameter is realized by warm starts. The step size between consecutive iterations is determined with backtracking line search. Finally, seagull -the R package presented here- produces complete regularization paths. Publicly available high-dimensional methylation data are used to compare seagull to the established R package SGL. The results of both packages enabled a precise prediction of biological age from DNA methylation status. But even though the results of seagull and SGL were very similar (R.sup.2 &gt; 0.99), seagull computed the solution in a fraction of the time needed by SGL. Additionally, seagull enables the incorporation of weights for each penalized feature. The following operators for linear regression models are available in seagull: lasso, group lasso, sparse-group lasso and Integrative LASSO with Penalty Factors (IPF-lasso). Thus, seagull is a convenient envelope of lasso variants.</abstract><cop>London</cop><pub>BioMed Central Ltd</pub><pmid>32933477</pmid><doi>10.1186/s12859-020-03725-w</doi><orcidid>https://orcid.org/0000-0002-3639-2574</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1471-2105
ispartof BMC bioinformatics, 2020-09, Vol.21 (1), p.1-407, Article 407
issn 1471-2105
1471-2105
language eng
recordid cdi_doaj_primary_oai_doaj_org_article_370df5c131c247cd9c7e534eb01fcdb0
source Open Access: PubMed Central; Publicly Available Content Database
subjects Accuracy
Age
Algorithms
Analysis
Biological models (mathematics)
DNA methylation
Epigenetics
Fines & penalties
Genomes
Goodness of fit
Gulls
High-dimensional data
Machine learning
Methylation
Operators
Optimization
Parameter estimation
R package
Regression analysis
Regression models
Regularization
Software
Statistical analysis
Variables
title Seagull: lasso, group lasso and sparse-group lasso regularization for linear regression models via proximal gradient descent
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-06T03%3A10%3A15IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_doaj_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Seagull:%20lasso,%20group%20lasso%20and%20sparse-group%20lasso%20regularization%20for%20linear%20regression%20models%20via%20proximal%20gradient%20descent&rft.jtitle=BMC%20bioinformatics&rft.au=Klosa,%20Jan&rft.date=2020-09-15&rft.volume=21&rft.issue=1&rft.spage=1&rft.epage=407&rft.pages=1-407&rft.artnum=407&rft.issn=1471-2105&rft.eissn=1471-2105&rft_id=info:doi/10.1186/s12859-020-03725-w&rft_dat=%3Cgale_doaj_%3EA636945445%3C/gale_doaj_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c574t-8e8769ad983e27d487b75d36e6ca2b5708ead9a0420ea56d6acee529d8151f543%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2444043022&rft_id=info:pmid/32933477&rft_galeid=A636945445&rfr_iscdi=true