Loading…

Optimizing R with SparkR on a commodity cluster for biomedical research

Highlights • R is a popular environment for clinical data analysis. Computational demanding tasks can often be paralelized in computing clusters using the Message Passing Interface. • (MPI) on traditional clusters or the relatively new SparkR variant as part of the Hadoop family. • SparkR allows sup...

Full description

Saved in:
Bibliographic Details
Published in:Computer methods and programs in biomedicine 2016-12, Vol.137, p.321-328
Main Authors: Sedlmayr, Martin, Würfl, Tobias, Maier, Christian, Häberle, Lothar, Fasching, Peter, Prokosch, Hans-Ulrich, Christoph, Jan
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c455t-deaf23f8938910914101b133799d6ad478030b29232160fbeadff85f390a3e243
cites cdi_FETCH-LOGICAL-c455t-deaf23f8938910914101b133799d6ad478030b29232160fbeadff85f390a3e243
container_end_page 328
container_issue
container_start_page 321
container_title Computer methods and programs in biomedicine
container_volume 137
creator Sedlmayr, Martin
Würfl, Tobias
Maier, Christian
Häberle, Lothar
Fasching, Peter
Prokosch, Hans-Ulrich
Christoph, Jan
description Highlights • R is a popular environment for clinical data analysis. Computational demanding tasks can often be paralelized in computing clusters using the Message Passing Interface. • (MPI) on traditional clusters or the relatively new SparkR variant as part of the Hadoop family. • SparkR allows supporting big data analysis using R even on non-dedicated resources with minimal change to original code. It offers elastic resources and tight integration with Hadoop distributed services for huge files. •Computation in SparkR scales better than with the Message Passing Interface (MPI) due to optimized data communication.
doi_str_mv 10.1016/j.cmpb.2016.10.006
format article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_1861593097</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>1_s2_0_S0169260716303807</els_id><sourcerecordid>1861593097</sourcerecordid><originalsourceid>FETCH-LOGICAL-c455t-deaf23f8938910914101b133799d6ad478030b29232160fbeadff85f390a3e243</originalsourceid><addsrcrecordid>eNp9kc1u1DAUhS0EokPhBVggL9lkuLbHTiwhJFRBQapUqT9ry3GuqadJHOwENH16HE1h0QUrW9fnHPl8l5C3DLYMmPqw37phare83MtgC6CekQ1ral7VUsnnZFMedMUV1CfkVc57AOBSqpfkhDeMQS3khpxfTnMYwkMYf9Ar-jvMd_R6sun-isaRWuriMMQuzAfq-iXPmKiPibYhDtgFZ3uaMKNN7u41eeFtn_HN43lKbr9-uTn7Vl1cnn8_-3xRuZ2Uc9Wh9Vz4RotGM9BsV4q0TIha607Zblc3IKDlmgvOFPgWbed9I73QYAXynTgl74-5U4o_F8yzGUJ22Pd2xLhkwxrFpBag6yLlR6lLMeeE3kwpDDYdDAOzAjR7swI0K8B1VgAW07vH_KUtHf9Z_hIrgo9HAZaWvwImk13A0RUeCd1suhj-n__pid31YVxR3uMB8z4uaSz8DDOZGzDX6wrXDTIlQDTlB38AgBGU8Q</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1861593097</pqid></control><display><type>article</type><title>Optimizing R with SparkR on a commodity cluster for biomedical research</title><source>ScienceDirect Journals</source><creator>Sedlmayr, Martin ; Würfl, Tobias ; Maier, Christian ; Häberle, Lothar ; Fasching, Peter ; Prokosch, Hans-Ulrich ; Christoph, Jan</creator><creatorcontrib>Sedlmayr, Martin ; Würfl, Tobias ; Maier, Christian ; Häberle, Lothar ; Fasching, Peter ; Prokosch, Hans-Ulrich ; Christoph, Jan</creatorcontrib><description>Highlights • R is a popular environment for clinical data analysis. Computational demanding tasks can often be paralelized in computing clusters using the Message Passing Interface. • (MPI) on traditional clusters or the relatively new SparkR variant as part of the Hadoop family. • SparkR allows supporting big data analysis using R even on non-dedicated resources with minimal change to original code. It offers elastic resources and tight integration with Hadoop distributed services for huge files. •Computation in SparkR scales better than with the Message Passing Interface (MPI) due to optimized data communication.</description><identifier>ISSN: 0169-2607</identifier><identifier>EISSN: 1872-7565</identifier><identifier>DOI: 10.1016/j.cmpb.2016.10.006</identifier><identifier>PMID: 28110735</identifier><language>eng</language><publisher>Ireland: Elsevier Ireland Ltd</publisher><subject>Big data ; Biomedical Research ; Cluster Analysis ; Cluster computing ; Computing Methodologies ; Genome-Wide Association Study ; Humans ; Internal Medicine ; Other ; SparkR</subject><ispartof>Computer methods and programs in biomedicine, 2016-12, Vol.137, p.321-328</ispartof><rights>The Authors</rights><rights>2016 The Authors</rights><rights>Copyright © 2016 The Authors. Published by Elsevier Ireland Ltd.. All rights reserved.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c455t-deaf23f8938910914101b133799d6ad478030b29232160fbeadff85f390a3e243</citedby><cites>FETCH-LOGICAL-c455t-deaf23f8938910914101b133799d6ad478030b29232160fbeadff85f390a3e243</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/28110735$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Sedlmayr, Martin</creatorcontrib><creatorcontrib>Würfl, Tobias</creatorcontrib><creatorcontrib>Maier, Christian</creatorcontrib><creatorcontrib>Häberle, Lothar</creatorcontrib><creatorcontrib>Fasching, Peter</creatorcontrib><creatorcontrib>Prokosch, Hans-Ulrich</creatorcontrib><creatorcontrib>Christoph, Jan</creatorcontrib><title>Optimizing R with SparkR on a commodity cluster for biomedical research</title><title>Computer methods and programs in biomedicine</title><addtitle>Comput Methods Programs Biomed</addtitle><description>Highlights • R is a popular environment for clinical data analysis. Computational demanding tasks can often be paralelized in computing clusters using the Message Passing Interface. • (MPI) on traditional clusters or the relatively new SparkR variant as part of the Hadoop family. • SparkR allows supporting big data analysis using R even on non-dedicated resources with minimal change to original code. It offers elastic resources and tight integration with Hadoop distributed services for huge files. •Computation in SparkR scales better than with the Message Passing Interface (MPI) due to optimized data communication.</description><subject>Big data</subject><subject>Biomedical Research</subject><subject>Cluster Analysis</subject><subject>Cluster computing</subject><subject>Computing Methodologies</subject><subject>Genome-Wide Association Study</subject><subject>Humans</subject><subject>Internal Medicine</subject><subject>Other</subject><subject>SparkR</subject><issn>0169-2607</issn><issn>1872-7565</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2016</creationdate><recordtype>article</recordtype><recordid>eNp9kc1u1DAUhS0EokPhBVggL9lkuLbHTiwhJFRBQapUqT9ry3GuqadJHOwENH16HE1h0QUrW9fnHPl8l5C3DLYMmPqw37phare83MtgC6CekQ1ral7VUsnnZFMedMUV1CfkVc57AOBSqpfkhDeMQS3khpxfTnMYwkMYf9Ar-jvMd_R6sun-isaRWuriMMQuzAfq-iXPmKiPibYhDtgFZ3uaMKNN7u41eeFtn_HN43lKbr9-uTn7Vl1cnn8_-3xRuZ2Uc9Wh9Vz4RotGM9BsV4q0TIha607Zblc3IKDlmgvOFPgWbed9I73QYAXynTgl74-5U4o_F8yzGUJ22Pd2xLhkwxrFpBag6yLlR6lLMeeE3kwpDDYdDAOzAjR7swI0K8B1VgAW07vH_KUtHf9Z_hIrgo9HAZaWvwImk13A0RUeCd1suhj-n__pid31YVxR3uMB8z4uaSz8DDOZGzDX6wrXDTIlQDTlB38AgBGU8Q</recordid><startdate>20161201</startdate><enddate>20161201</enddate><creator>Sedlmayr, Martin</creator><creator>Würfl, Tobias</creator><creator>Maier, Christian</creator><creator>Häberle, Lothar</creator><creator>Fasching, Peter</creator><creator>Prokosch, Hans-Ulrich</creator><creator>Christoph, Jan</creator><general>Elsevier Ireland Ltd</general><scope>6I.</scope><scope>AAFTH</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope></search><sort><creationdate>20161201</creationdate><title>Optimizing R with SparkR on a commodity cluster for biomedical research</title><author>Sedlmayr, Martin ; Würfl, Tobias ; Maier, Christian ; Häberle, Lothar ; Fasching, Peter ; Prokosch, Hans-Ulrich ; Christoph, Jan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c455t-deaf23f8938910914101b133799d6ad478030b29232160fbeadff85f390a3e243</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2016</creationdate><topic>Big data</topic><topic>Biomedical Research</topic><topic>Cluster Analysis</topic><topic>Cluster computing</topic><topic>Computing Methodologies</topic><topic>Genome-Wide Association Study</topic><topic>Humans</topic><topic>Internal Medicine</topic><topic>Other</topic><topic>SparkR</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Sedlmayr, Martin</creatorcontrib><creatorcontrib>Würfl, Tobias</creatorcontrib><creatorcontrib>Maier, Christian</creatorcontrib><creatorcontrib>Häberle, Lothar</creatorcontrib><creatorcontrib>Fasching, Peter</creatorcontrib><creatorcontrib>Prokosch, Hans-Ulrich</creatorcontrib><creatorcontrib>Christoph, Jan</creatorcontrib><collection>ScienceDirect Open Access Titles</collection><collection>Elsevier:ScienceDirect:Open Access</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><jtitle>Computer methods and programs in biomedicine</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Sedlmayr, Martin</au><au>Würfl, Tobias</au><au>Maier, Christian</au><au>Häberle, Lothar</au><au>Fasching, Peter</au><au>Prokosch, Hans-Ulrich</au><au>Christoph, Jan</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Optimizing R with SparkR on a commodity cluster for biomedical research</atitle><jtitle>Computer methods and programs in biomedicine</jtitle><addtitle>Comput Methods Programs Biomed</addtitle><date>2016-12-01</date><risdate>2016</risdate><volume>137</volume><spage>321</spage><epage>328</epage><pages>321-328</pages><issn>0169-2607</issn><eissn>1872-7565</eissn><abstract>Highlights • R is a popular environment for clinical data analysis. Computational demanding tasks can often be paralelized in computing clusters using the Message Passing Interface. • (MPI) on traditional clusters or the relatively new SparkR variant as part of the Hadoop family. • SparkR allows supporting big data analysis using R even on non-dedicated resources with minimal change to original code. It offers elastic resources and tight integration with Hadoop distributed services for huge files. •Computation in SparkR scales better than with the Message Passing Interface (MPI) due to optimized data communication.</abstract><cop>Ireland</cop><pub>Elsevier Ireland Ltd</pub><pmid>28110735</pmid><doi>10.1016/j.cmpb.2016.10.006</doi><tpages>8</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 0169-2607
ispartof Computer methods and programs in biomedicine, 2016-12, Vol.137, p.321-328
issn 0169-2607
1872-7565
language eng
recordid cdi_proquest_miscellaneous_1861593097
source ScienceDirect Journals
subjects Big data
Biomedical Research
Cluster Analysis
Cluster computing
Computing Methodologies
Genome-Wide Association Study
Humans
Internal Medicine
Other
SparkR
title Optimizing R with SparkR on a commodity cluster for biomedical research
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-29T18%3A55%3A05IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Optimizing%20R%20with%20SparkR%20on%20a%20commodity%20cluster%20for%20biomedical%20research&rft.jtitle=Computer%20methods%20and%20programs%20in%20biomedicine&rft.au=Sedlmayr,%20Martin&rft.date=2016-12-01&rft.volume=137&rft.spage=321&rft.epage=328&rft.pages=321-328&rft.issn=0169-2607&rft.eissn=1872-7565&rft_id=info:doi/10.1016/j.cmpb.2016.10.006&rft_dat=%3Cproquest_cross%3E1861593097%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c455t-deaf23f8938910914101b133799d6ad478030b29232160fbeadff85f390a3e243%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=1861593097&rft_id=info:pmid/28110735&rfr_iscdi=true