Loading…
Optimizing R with SparkR on a commodity cluster for biomedical research
Highlights • R is a popular environment for clinical data analysis. Computational demanding tasks can often be paralelized in computing clusters using the Message Passing Interface. • (MPI) on traditional clusters or the relatively new SparkR variant as part of the Hadoop family. • SparkR allows sup...
Saved in:
Published in: | Computer methods and programs in biomedicine 2016-12, Vol.137, p.321-328 |
---|---|
Main Authors: | , , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Highlights • R is a popular environment for clinical data analysis. Computational demanding tasks can often be paralelized in computing clusters using the Message Passing Interface. • (MPI) on traditional clusters or the relatively new SparkR variant as part of the Hadoop family. • SparkR allows supporting big data analysis using R even on non-dedicated resources with minimal change to original code. It offers elastic resources and tight integration with Hadoop distributed services for huge files. •Computation in SparkR scales better than with the Message Passing Interface (MPI) due to optimized data communication. |
---|---|
ISSN: | 0169-2607 1872-7565 |
DOI: | 10.1016/j.cmpb.2016.10.006 |