Loading…

Optimizing R with SparkR on a commodity cluster for biomedical research

Highlights • R is a popular environment for clinical data analysis. Computational demanding tasks can often be paralelized in computing clusters using the Message Passing Interface. • (MPI) on traditional clusters or the relatively new SparkR variant as part of the Hadoop family. • SparkR allows sup...

Full description

Saved in:
Bibliographic Details
Published in:Computer methods and programs in biomedicine 2016-12, Vol.137, p.321-328
Main Authors: Sedlmayr, Martin, Würfl, Tobias, Maier, Christian, Häberle, Lothar, Fasching, Peter, Prokosch, Hans-Ulrich, Christoph, Jan
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Highlights • R is a popular environment for clinical data analysis. Computational demanding tasks can often be paralelized in computing clusters using the Message Passing Interface. • (MPI) on traditional clusters or the relatively new SparkR variant as part of the Hadoop family. • SparkR allows supporting big data analysis using R even on non-dedicated resources with minimal change to original code. It offers elastic resources and tight integration with Hadoop distributed services for huge files. •Computation in SparkR scales better than with the Message Passing Interface (MPI) due to optimized data communication.
ISSN:0169-2607
1872-7565
DOI:10.1016/j.cmpb.2016.10.006