Loading…
Optimizing R with SparkR on a commodity cluster for biomedical research
Highlights • R is a popular environment for clinical data analysis. Computational demanding tasks can often be paralelized in computing clusters using the Message Passing Interface. • (MPI) on traditional clusters or the relatively new SparkR variant as part of the Hadoop family. • SparkR allows sup...
Saved in:
Published in: | Computer methods and programs in biomedicine 2016-12, Vol.137, p.321-328 |
---|---|
Main Authors: | , , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | cdi_FETCH-LOGICAL-c455t-deaf23f8938910914101b133799d6ad478030b29232160fbeadff85f390a3e243 |
---|---|
cites | cdi_FETCH-LOGICAL-c455t-deaf23f8938910914101b133799d6ad478030b29232160fbeadff85f390a3e243 |
container_end_page | 328 |
container_issue | |
container_start_page | 321 |
container_title | Computer methods and programs in biomedicine |
container_volume | 137 |
creator | Sedlmayr, Martin Würfl, Tobias Maier, Christian Häberle, Lothar Fasching, Peter Prokosch, Hans-Ulrich Christoph, Jan |
description | Highlights • R is a popular environment for clinical data analysis. Computational demanding tasks can often be paralelized in computing clusters using the Message Passing Interface. • (MPI) on traditional clusters or the relatively new SparkR variant as part of the Hadoop family. • SparkR allows supporting big data analysis using R even on non-dedicated resources with minimal change to original code. It offers elastic resources and tight integration with Hadoop distributed services for huge files. •Computation in SparkR scales better than with the Message Passing Interface (MPI) due to optimized data communication. |
doi_str_mv | 10.1016/j.cmpb.2016.10.006 |
format | article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_1861593097</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>1_s2_0_S0169260716303807</els_id><sourcerecordid>1861593097</sourcerecordid><originalsourceid>FETCH-LOGICAL-c455t-deaf23f8938910914101b133799d6ad478030b29232160fbeadff85f390a3e243</originalsourceid><addsrcrecordid>eNp9kc1u1DAUhS0EokPhBVggL9lkuLbHTiwhJFRBQapUqT9ry3GuqadJHOwENH16HE1h0QUrW9fnHPl8l5C3DLYMmPqw37phare83MtgC6CekQ1ral7VUsnnZFMedMUV1CfkVc57AOBSqpfkhDeMQS3khpxfTnMYwkMYf9Ar-jvMd_R6sun-isaRWuriMMQuzAfq-iXPmKiPibYhDtgFZ3uaMKNN7u41eeFtn_HN43lKbr9-uTn7Vl1cnn8_-3xRuZ2Uc9Wh9Vz4RotGM9BsV4q0TIha607Zblc3IKDlmgvOFPgWbed9I73QYAXynTgl74-5U4o_F8yzGUJ22Pd2xLhkwxrFpBag6yLlR6lLMeeE3kwpDDYdDAOzAjR7swI0K8B1VgAW07vH_KUtHf9Z_hIrgo9HAZaWvwImk13A0RUeCd1suhj-n__pid31YVxR3uMB8z4uaSz8DDOZGzDX6wrXDTIlQDTlB38AgBGU8Q</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1861593097</pqid></control><display><type>article</type><title>Optimizing R with SparkR on a commodity cluster for biomedical research</title><source>ScienceDirect Journals</source><creator>Sedlmayr, Martin ; Würfl, Tobias ; Maier, Christian ; Häberle, Lothar ; Fasching, Peter ; Prokosch, Hans-Ulrich ; Christoph, Jan</creator><creatorcontrib>Sedlmayr, Martin ; Würfl, Tobias ; Maier, Christian ; Häberle, Lothar ; Fasching, Peter ; Prokosch, Hans-Ulrich ; Christoph, Jan</creatorcontrib><description>Highlights • R is a popular environment for clinical data analysis. Computational demanding tasks can often be paralelized in computing clusters using the Message Passing Interface. • (MPI) on traditional clusters or the relatively new SparkR variant as part of the Hadoop family. • SparkR allows supporting big data analysis using R even on non-dedicated resources with minimal change to original code. It offers elastic resources and tight integration with Hadoop distributed services for huge files. •Computation in SparkR scales better than with the Message Passing Interface (MPI) due to optimized data communication.</description><identifier>ISSN: 0169-2607</identifier><identifier>EISSN: 1872-7565</identifier><identifier>DOI: 10.1016/j.cmpb.2016.10.006</identifier><identifier>PMID: 28110735</identifier><language>eng</language><publisher>Ireland: Elsevier Ireland Ltd</publisher><subject>Big data ; Biomedical Research ; Cluster Analysis ; Cluster computing ; Computing Methodologies ; Genome-Wide Association Study ; Humans ; Internal Medicine ; Other ; SparkR</subject><ispartof>Computer methods and programs in biomedicine, 2016-12, Vol.137, p.321-328</ispartof><rights>The Authors</rights><rights>2016 The Authors</rights><rights>Copyright © 2016 The Authors. Published by Elsevier Ireland Ltd.. All rights reserved.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c455t-deaf23f8938910914101b133799d6ad478030b29232160fbeadff85f390a3e243</citedby><cites>FETCH-LOGICAL-c455t-deaf23f8938910914101b133799d6ad478030b29232160fbeadff85f390a3e243</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/28110735$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Sedlmayr, Martin</creatorcontrib><creatorcontrib>Würfl, Tobias</creatorcontrib><creatorcontrib>Maier, Christian</creatorcontrib><creatorcontrib>Häberle, Lothar</creatorcontrib><creatorcontrib>Fasching, Peter</creatorcontrib><creatorcontrib>Prokosch, Hans-Ulrich</creatorcontrib><creatorcontrib>Christoph, Jan</creatorcontrib><title>Optimizing R with SparkR on a commodity cluster for biomedical research</title><title>Computer methods and programs in biomedicine</title><addtitle>Comput Methods Programs Biomed</addtitle><description>Highlights • R is a popular environment for clinical data analysis. Computational demanding tasks can often be paralelized in computing clusters using the Message Passing Interface. • (MPI) on traditional clusters or the relatively new SparkR variant as part of the Hadoop family. • SparkR allows supporting big data analysis using R even on non-dedicated resources with minimal change to original code. It offers elastic resources and tight integration with Hadoop distributed services for huge files. •Computation in SparkR scales better than with the Message Passing Interface (MPI) due to optimized data communication.</description><subject>Big data</subject><subject>Biomedical Research</subject><subject>Cluster Analysis</subject><subject>Cluster computing</subject><subject>Computing Methodologies</subject><subject>Genome-Wide Association Study</subject><subject>Humans</subject><subject>Internal Medicine</subject><subject>Other</subject><subject>SparkR</subject><issn>0169-2607</issn><issn>1872-7565</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2016</creationdate><recordtype>article</recordtype><recordid>eNp9kc1u1DAUhS0EokPhBVggL9lkuLbHTiwhJFRBQapUqT9ry3GuqadJHOwENH16HE1h0QUrW9fnHPl8l5C3DLYMmPqw37phare83MtgC6CekQ1ral7VUsnnZFMedMUV1CfkVc57AOBSqpfkhDeMQS3khpxfTnMYwkMYf9Ar-jvMd_R6sun-isaRWuriMMQuzAfq-iXPmKiPibYhDtgFZ3uaMKNN7u41eeFtn_HN43lKbr9-uTn7Vl1cnn8_-3xRuZ2Uc9Wh9Vz4RotGM9BsV4q0TIha607Zblc3IKDlmgvOFPgWbed9I73QYAXynTgl74-5U4o_F8yzGUJ22Pd2xLhkwxrFpBag6yLlR6lLMeeE3kwpDDYdDAOzAjR7swI0K8B1VgAW07vH_KUtHf9Z_hIrgo9HAZaWvwImk13A0RUeCd1suhj-n__pid31YVxR3uMB8z4uaSz8DDOZGzDX6wrXDTIlQDTlB38AgBGU8Q</recordid><startdate>20161201</startdate><enddate>20161201</enddate><creator>Sedlmayr, Martin</creator><creator>Würfl, Tobias</creator><creator>Maier, Christian</creator><creator>Häberle, Lothar</creator><creator>Fasching, Peter</creator><creator>Prokosch, Hans-Ulrich</creator><creator>Christoph, Jan</creator><general>Elsevier Ireland Ltd</general><scope>6I.</scope><scope>AAFTH</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope></search><sort><creationdate>20161201</creationdate><title>Optimizing R with SparkR on a commodity cluster for biomedical research</title><author>Sedlmayr, Martin ; Würfl, Tobias ; Maier, Christian ; Häberle, Lothar ; Fasching, Peter ; Prokosch, Hans-Ulrich ; Christoph, Jan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c455t-deaf23f8938910914101b133799d6ad478030b29232160fbeadff85f390a3e243</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2016</creationdate><topic>Big data</topic><topic>Biomedical Research</topic><topic>Cluster Analysis</topic><topic>Cluster computing</topic><topic>Computing Methodologies</topic><topic>Genome-Wide Association Study</topic><topic>Humans</topic><topic>Internal Medicine</topic><topic>Other</topic><topic>SparkR</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Sedlmayr, Martin</creatorcontrib><creatorcontrib>Würfl, Tobias</creatorcontrib><creatorcontrib>Maier, Christian</creatorcontrib><creatorcontrib>Häberle, Lothar</creatorcontrib><creatorcontrib>Fasching, Peter</creatorcontrib><creatorcontrib>Prokosch, Hans-Ulrich</creatorcontrib><creatorcontrib>Christoph, Jan</creatorcontrib><collection>ScienceDirect Open Access Titles</collection><collection>Elsevier:ScienceDirect:Open Access</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><jtitle>Computer methods and programs in biomedicine</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Sedlmayr, Martin</au><au>Würfl, Tobias</au><au>Maier, Christian</au><au>Häberle, Lothar</au><au>Fasching, Peter</au><au>Prokosch, Hans-Ulrich</au><au>Christoph, Jan</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Optimizing R with SparkR on a commodity cluster for biomedical research</atitle><jtitle>Computer methods and programs in biomedicine</jtitle><addtitle>Comput Methods Programs Biomed</addtitle><date>2016-12-01</date><risdate>2016</risdate><volume>137</volume><spage>321</spage><epage>328</epage><pages>321-328</pages><issn>0169-2607</issn><eissn>1872-7565</eissn><abstract>Highlights • R is a popular environment for clinical data analysis. Computational demanding tasks can often be paralelized in computing clusters using the Message Passing Interface. • (MPI) on traditional clusters or the relatively new SparkR variant as part of the Hadoop family. • SparkR allows supporting big data analysis using R even on non-dedicated resources with minimal change to original code. It offers elastic resources and tight integration with Hadoop distributed services for huge files. •Computation in SparkR scales better than with the Message Passing Interface (MPI) due to optimized data communication.</abstract><cop>Ireland</cop><pub>Elsevier Ireland Ltd</pub><pmid>28110735</pmid><doi>10.1016/j.cmpb.2016.10.006</doi><tpages>8</tpages><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0169-2607 |
ispartof | Computer methods and programs in biomedicine, 2016-12, Vol.137, p.321-328 |
issn | 0169-2607 1872-7565 |
language | eng |
recordid | cdi_proquest_miscellaneous_1861593097 |
source | ScienceDirect Journals |
subjects | Big data Biomedical Research Cluster Analysis Cluster computing Computing Methodologies Genome-Wide Association Study Humans Internal Medicine Other SparkR |
title | Optimizing R with SparkR on a commodity cluster for biomedical research |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-29T18%3A55%3A05IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Optimizing%20R%20with%20SparkR%20on%20a%20commodity%20cluster%20for%20biomedical%20research&rft.jtitle=Computer%20methods%20and%20programs%20in%20biomedicine&rft.au=Sedlmayr,%20Martin&rft.date=2016-12-01&rft.volume=137&rft.spage=321&rft.epage=328&rft.pages=321-328&rft.issn=0169-2607&rft.eissn=1872-7565&rft_id=info:doi/10.1016/j.cmpb.2016.10.006&rft_dat=%3Cproquest_cross%3E1861593097%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c455t-deaf23f8938910914101b133799d6ad478030b29232160fbeadff85f390a3e243%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=1861593097&rft_id=info:pmid/28110735&rfr_iscdi=true |