Loading…

Experience with benchmarking dependability and performance of MapReduce systems

MapReduce provides a convenient means for distributed data processing and automatic parallel execution on clusters of machines. It has various applications and is used by several services featuring fault tolerance and scalability. Many studies investigated the dependability and performance of MapRed...

Full description

Saved in:

Bibliographic Details
Published in:	Performance evaluation 2016-07, Vol.101, p.1-19
Main Authors:	Sangroya, Amit, Bouchenak, Sara, Serrano, Damián
Format:	Article
Language:	English
Subjects:	Automation Benchmarking Clouds Clusters Computer Science Dependability Distributed, Parallel, and Cluster Computing Emerging Technologies Fault tolerance Hadoop MapReduce Mathematical models Modulus of rupture in bending Statistics Systems and Control
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by	cdi_FETCH-LOGICAL-c367t-3e132c458e1bc0d86c250f8eae4be9b7702593ba84639e3e115fcd4284e84b13
cites	cdi_FETCH-LOGICAL-c367t-3e132c458e1bc0d86c250f8eae4be9b7702593ba84639e3e115fcd4284e84b13
container_end_page	19
container_issue
container_start_page	1
container_title	Performance evaluation
container_volume	101
creator	Sangroya, Amit Bouchenak, Sara Serrano, Damián
description	MapReduce provides a convenient means for distributed data processing and automatic parallel execution on clusters of machines. It has various applications and is used by several services featuring fault tolerance and scalability. Many studies investigated the dependability and performance of MapReduce, ranging from job scheduling to data placement and replication, adaptive and on-demand fault tolerance to new fault tolerance models. However, the ad-hoc and overly simplified setting used to evaluate most MapReduce fault tolerance and performance improvement solutions poses significant challenges to the analysis and comparison of the effectiveness of these solutions. The paper precisely addresses this issue and presents MRBS, a comprehensive benchmark suite for evaluating the dependability and performance of MapReduce systems. MRBS includes five benchmarks covering several application domains and a wide range of execution scenarios such as data-intensive vs. compute-intensive applications, or batch applications vs. online interactive applications. MRBS allows to inject various workloads, dataloads and faultloads, and produces extensive reliability, availability and performance statistics. We implemented the MRBS benchmark suite for Hadoop MapReduce, and we illustrate its use with various case studies running on Amazon EC2 and on a private cloud.
doi_str_mv	10.1016/j.peva.2016.04.001
format	article
fullrecord	<record><control><sourceid>proquest_hal_p</sourceid><recordid>TN_cdi_hal_primary_oai_HAL_hal_01372628v1</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0166531616300207</els_id><sourcerecordid>1825475158</sourcerecordid><originalsourceid>FETCH-LOGICAL-c367t-3e132c458e1bc0d86c250f8eae4be9b7702593ba84639e3e115fcd4284e84b13</originalsourceid><addsrcrecordid>eNp9kD1PwzAQhi0EEqXwB5gywpDgz8SVWKqqUKSiSqgDm-U4F-qSJsFOC_33OAQxMvl8ft7T-UHomuCEYJLebZMWDjqhoU4wTzAmJ2hEZEbjjIvXUzQKD2ksGEnP0YX3W4yxyBgeodX8qwVnoTYQfdpuE-Wh3Oy0e7f1W1RAC3Whc1vZ7hjpuogCXDZup3u-KaNn3b5AsQ8Xf_Qd7PwlOit15eHq9xyj9cN8PVvEy9Xj02y6jA1Lsy5mQBg1XEggucGFTA0VuJSggecwybMMUzFhuZY8ZRMINBGlKTiVHCTPCRuj22HsRleqdTYsfFSNtmoxXaq-hwnLaErloWdvBrZ1zccefKd21huoKl1Ds_eKSCp4JoiQAaUDalzjvYPybzbBqhettqoXrXrRCnMVRIfQ_RCC8N-DBae8-RFaWAemU0Vj_4t_AwI6hrU</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1825475158</pqid></control><display><type>article</type><title>Experience with benchmarking dependability and performance of MapReduce systems</title><source>ScienceDirect Freedom Collection</source><creator>Sangroya, Amit ; Bouchenak, Sara ; Serrano, Damián</creator><creatorcontrib>Sangroya, Amit ; Bouchenak, Sara ; Serrano, Damián</creatorcontrib><description>MapReduce provides a convenient means for distributed data processing and automatic parallel execution on clusters of machines. It has various applications and is used by several services featuring fault tolerance and scalability. Many studies investigated the dependability and performance of MapReduce, ranging from job scheduling to data placement and replication, adaptive and on-demand fault tolerance to new fault tolerance models. However, the ad-hoc and overly simplified setting used to evaluate most MapReduce fault tolerance and performance improvement solutions poses significant challenges to the analysis and comparison of the effectiveness of these solutions. The paper precisely addresses this issue and presents MRBS, a comprehensive benchmark suite for evaluating the dependability and performance of MapReduce systems. MRBS includes five benchmarks covering several application domains and a wide range of execution scenarios such as data-intensive vs. compute-intensive applications, or batch applications vs. online interactive applications. MRBS allows to inject various workloads, dataloads and faultloads, and produces extensive reliability, availability and performance statistics. We implemented the MRBS benchmark suite for Hadoop MapReduce, and we illustrate its use with various case studies running on Amazon EC2 and on a private cloud.</description><identifier>ISSN: 0166-5316</identifier><identifier>EISSN: 1872-745X</identifier><identifier>DOI: 10.1016/j.peva.2016.04.001</identifier><language>eng</language><publisher>Elsevier B.V</publisher><subject>Automation ; Benchmarking ; Clouds ; Clusters ; Computer Science ; Dependability ; Distributed, Parallel, and Cluster Computing ; Emerging Technologies ; Fault tolerance ; Hadoop ; MapReduce ; Mathematical models ; Modulus of rupture in bending ; Statistics ; Systems and Control</subject><ispartof>Performance evaluation, 2016-07, Vol.101, p.1-19</ispartof><rights>2016 Elsevier B.V.</rights><rights>Distributed under a Creative Commons Attribution 4.0 International License</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c367t-3e132c458e1bc0d86c250f8eae4be9b7702593ba84639e3e115fcd4284e84b13</citedby><cites>FETCH-LOGICAL-c367t-3e132c458e1bc0d86c250f8eae4be9b7702593ba84639e3e115fcd4284e84b13</cites><orcidid>0000-0001-9825-353X</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>230,314,780,784,885,27924,27925</link.rule.ids><backlink>$$Uhttps://hal.science/hal-01372628$$DView record in HAL$$Hfree_for_read</backlink></links><search><creatorcontrib>Sangroya, Amit</creatorcontrib><creatorcontrib>Bouchenak, Sara</creatorcontrib><creatorcontrib>Serrano, Damián</creatorcontrib><title>Experience with benchmarking dependability and performance of MapReduce systems</title><title>Performance evaluation</title><description>MapReduce provides a convenient means for distributed data processing and automatic parallel execution on clusters of machines. It has various applications and is used by several services featuring fault tolerance and scalability. Many studies investigated the dependability and performance of MapReduce, ranging from job scheduling to data placement and replication, adaptive and on-demand fault tolerance to new fault tolerance models. However, the ad-hoc and overly simplified setting used to evaluate most MapReduce fault tolerance and performance improvement solutions poses significant challenges to the analysis and comparison of the effectiveness of these solutions. The paper precisely addresses this issue and presents MRBS, a comprehensive benchmark suite for evaluating the dependability and performance of MapReduce systems. MRBS includes five benchmarks covering several application domains and a wide range of execution scenarios such as data-intensive vs. compute-intensive applications, or batch applications vs. online interactive applications. MRBS allows to inject various workloads, dataloads and faultloads, and produces extensive reliability, availability and performance statistics. We implemented the MRBS benchmark suite for Hadoop MapReduce, and we illustrate its use with various case studies running on Amazon EC2 and on a private cloud.</description><subject>Automation</subject><subject>Benchmarking</subject><subject>Clouds</subject><subject>Clusters</subject><subject>Computer Science</subject><subject>Dependability</subject><subject>Distributed, Parallel, and Cluster Computing</subject><subject>Emerging Technologies</subject><subject>Fault tolerance</subject><subject>Hadoop</subject><subject>MapReduce</subject><subject>Mathematical models</subject><subject>Modulus of rupture in bending</subject><subject>Statistics</subject><subject>Systems and Control</subject><issn>0166-5316</issn><issn>1872-745X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2016</creationdate><recordtype>article</recordtype><recordid>eNp9kD1PwzAQhi0EEqXwB5gywpDgz8SVWKqqUKSiSqgDm-U4F-qSJsFOC_33OAQxMvl8ft7T-UHomuCEYJLebZMWDjqhoU4wTzAmJ2hEZEbjjIvXUzQKD2ksGEnP0YX3W4yxyBgeodX8qwVnoTYQfdpuE-Wh3Oy0e7f1W1RAC3Whc1vZ7hjpuogCXDZup3u-KaNn3b5AsQ8Xf_Qd7PwlOit15eHq9xyj9cN8PVvEy9Xj02y6jA1Lsy5mQBg1XEggucGFTA0VuJSggecwybMMUzFhuZY8ZRMINBGlKTiVHCTPCRuj22HsRleqdTYsfFSNtmoxXaq-hwnLaErloWdvBrZ1zccefKd21huoKl1Ds_eKSCp4JoiQAaUDalzjvYPybzbBqhettqoXrXrRCnMVRIfQ_RCC8N-DBae8-RFaWAemU0Vj_4t_AwI6hrU</recordid><startdate>201607</startdate><enddate>201607</enddate><creator>Sangroya, Amit</creator><creator>Bouchenak, Sara</creator><creator>Serrano, Damián</creator><general>Elsevier B.V</general><general>Elsevier</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7TA</scope><scope>8FD</scope><scope>JG9</scope><scope>1XC</scope><orcidid>https://orcid.org/0000-0001-9825-353X</orcidid></search><sort><creationdate>201607</creationdate><title>Experience with benchmarking dependability and performance of MapReduce systems</title><author>Sangroya, Amit ; Bouchenak, Sara ; Serrano, Damián</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c367t-3e132c458e1bc0d86c250f8eae4be9b7702593ba84639e3e115fcd4284e84b13</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2016</creationdate><topic>Automation</topic><topic>Benchmarking</topic><topic>Clouds</topic><topic>Clusters</topic><topic>Computer Science</topic><topic>Dependability</topic><topic>Distributed, Parallel, and Cluster Computing</topic><topic>Emerging Technologies</topic><topic>Fault tolerance</topic><topic>Hadoop</topic><topic>MapReduce</topic><topic>Mathematical models</topic><topic>Modulus of rupture in bending</topic><topic>Statistics</topic><topic>Systems and Control</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Sangroya, Amit</creatorcontrib><creatorcontrib>Bouchenak, Sara</creatorcontrib><creatorcontrib>Serrano, Damián</creatorcontrib><collection>CrossRef</collection><collection>Materials Business File</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>Hyper Article en Ligne (HAL)</collection><jtitle>Performance evaluation</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Sangroya, Amit</au><au>Bouchenak, Sara</au><au>Serrano, Damián</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Experience with benchmarking dependability and performance of MapReduce systems</atitle><jtitle>Performance evaluation</jtitle><date>2016-07</date><risdate>2016</risdate><volume>101</volume><spage>1</spage><epage>19</epage><pages>1-19</pages><issn>0166-5316</issn><eissn>1872-745X</eissn><abstract>MapReduce provides a convenient means for distributed data processing and automatic parallel execution on clusters of machines. It has various applications and is used by several services featuring fault tolerance and scalability. Many studies investigated the dependability and performance of MapReduce, ranging from job scheduling to data placement and replication, adaptive and on-demand fault tolerance to new fault tolerance models. However, the ad-hoc and overly simplified setting used to evaluate most MapReduce fault tolerance and performance improvement solutions poses significant challenges to the analysis and comparison of the effectiveness of these solutions. The paper precisely addresses this issue and presents MRBS, a comprehensive benchmark suite for evaluating the dependability and performance of MapReduce systems. MRBS includes five benchmarks covering several application domains and a wide range of execution scenarios such as data-intensive vs. compute-intensive applications, or batch applications vs. online interactive applications. MRBS allows to inject various workloads, dataloads and faultloads, and produces extensive reliability, availability and performance statistics. We implemented the MRBS benchmark suite for Hadoop MapReduce, and we illustrate its use with various case studies running on Amazon EC2 and on a private cloud.</abstract><pub>Elsevier B.V</pub><doi>10.1016/j.peva.2016.04.001</doi><tpages>19</tpages><orcidid>https://orcid.org/0000-0001-9825-353X</orcidid></addata></record>
fulltext	fulltext
identifier	ISSN: 0166-5316
ispartof	Performance evaluation, 2016-07, Vol.101, p.1-19
issn	0166-5316 1872-745X
language	eng
recordid	cdi_hal_primary_oai_HAL_hal_01372628v1
source	ScienceDirect Freedom Collection
subjects	Automation Benchmarking Clouds Clusters Computer Science Dependability Distributed, Parallel, and Cluster Computing Emerging Technologies Fault tolerance Hadoop MapReduce Mathematical models Modulus of rupture in bending Statistics Systems and Control
title	Experience with benchmarking dependability and performance of MapReduce systems
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-01T03%3A05%3A39IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_hal_p&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Experience%20with%20benchmarking%20dependability%20and%20performance%20of%20MapReduce%20systems&rft.jtitle=Performance%20evaluation&rft.au=Sangroya,%20Amit&rft.date=2016-07&rft.volume=101&rft.spage=1&rft.epage=19&rft.pages=1-19&rft.issn=0166-5316&rft.eissn=1872-745X&rft_id=info:doi/10.1016/j.peva.2016.04.001&rft_dat=%3Cproquest_hal_p%3E1825475158%3C/proquest_hal_p%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c367t-3e132c458e1bc0d86c250f8eae4be9b7702593ba84639e3e115fcd4284e84b13%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=1825475158&rft_id=info:pmid/&rfr_iscdi=true