Loading…

HDM-MC in-Action: A Framework for Big Data Analytics across Multiple Clusters

Big data are increasingly collected and stored in a highly distributed infrastructures due to the development of several emerging technologies including sensor network, cloud computing, IoT and mobile computing among many other emerging technologies. In practice, the majority of existing big data pr...

Full description

Saved in:

Bibliographic Details
Main Authors:	Wu, Dongyao, Sakr, Sherif, Zhu, Liming, Lee, Sung, Wu, Huijun
Format:	Conference Proceeding
Language:	English
Subjects:	Big Data Data centers Distributed databases Distributed Systems Organizations Planning Scheduling Workflows
Online Access:	Request full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by
cites
container_end_page	1550
container_issue
container_start_page	1547
container_title
container_volume
creator	Wu, Dongyao Sakr, Sherif Zhu, Liming Lee, Sung Wu, Huijun
description	Big data are increasingly collected and stored in a highly distributed infrastructures due to the development of several emerging technologies including sensor network, cloud computing, IoT and mobile computing among many other emerging technologies. In practice, the majority of existing big data processing frameworks (e.g., Hadoop, Spark, Flink) are designed based on the single-cluster setup with the assumptions of centralized management and homogeneous connectivity which makes them sub-optimal and sometimes infeasible to be applied for scenarios that require implementing data analytics jobs on highly distributed data sets (across racks, data centers or multi organizations). We demonstrate HDM-MC, a big data processing framework that is designed to enable the capability of performing large scale data analytics across multi-clusters with minimum extra overhead due to additional scheduling requirements. We describe the architecture and realization of the system using a step-by-step example scenario.
doi_str_mv	10.1109/ICDCS.2018.00165
format	conference_proceeding
fullrecord	<record><control><sourceid>ieee_CHZPO</sourceid><recordid>TN_cdi_ieee_primary_8416428</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>8416428</ieee_id><sourcerecordid>8416428</sourcerecordid><originalsourceid>FETCH-LOGICAL-i175t-736463b4f980d20013a1df02a52a97c2e2f187b478be34fbf21b5ca97c9c5bd43</originalsourceid><addsrcrecordid>eNotj09LwzAchqMgOOfugpd8gdb88r_earu5wYoH9TzSNpFo144kQ_btnejpPTzw8LwI3QHJAUjxsKnq6jWnBHROCEhxgW5AMC2lVqAv0YwKJTLNAa7RIsZPQgiVmhMlZqhZ103WVNiPWdklP42PuMSrYPb2ewpf2E0BP_kPXJtkcDma4ZR8F7HpwhQjbo5D8ofB4mo4xmRDvEVXzgzRLv53jt5Xy7dqnW1fnjdVuc08KJEyxSSXrOWu0KSn52RmoHeEGkFNoTpqqQOtWq50axl3raPQiu4XFZ1oe87m6P7P6621u0PwexNOu_NDyalmP6fsTE4</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>HDM-MC in-Action: A Framework for Big Data Analytics across Multiple Clusters</title><source>IEEE Xplore All Conference Series</source><creator>Wu, Dongyao ; Sakr, Sherif ; Zhu, Liming ; Lee, Sung ; Wu, Huijun</creator><creatorcontrib>Wu, Dongyao ; Sakr, Sherif ; Zhu, Liming ; Lee, Sung ; Wu, Huijun</creatorcontrib><description>Big data are increasingly collected and stored in a highly distributed infrastructures due to the development of several emerging technologies including sensor network, cloud computing, IoT and mobile computing among many other emerging technologies. In practice, the majority of existing big data processing frameworks (e.g., Hadoop, Spark, Flink) are designed based on the single-cluster setup with the assumptions of centralized management and homogeneous connectivity which makes them sub-optimal and sometimes infeasible to be applied for scenarios that require implementing data analytics jobs on highly distributed data sets (across racks, data centers or multi organizations). We demonstrate HDM-MC, a big data processing framework that is designed to enable the capability of performing large scale data analytics across multi-clusters with minimum extra overhead due to additional scheduling requirements. We describe the architecture and realization of the system using a step-by-step example scenario.</description><identifier>EISSN: 2575-8411</identifier><identifier>EISBN: 1538668718</identifier><identifier>EISBN: 9781538668719</identifier><identifier>DOI: 10.1109/ICDCS.2018.00165</identifier><identifier>CODEN: IEEPAD</identifier><language>eng</language><publisher>IEEE</publisher><subject>Big Data ; Data centers ; Distributed databases ; Distributed Systems ; Organizations ; Planning ; Scheduling ; Workflows</subject><ispartof>2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS), 2018, p.1547-1550</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/8416428$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,23930,23931,25140,27925,54555,54932</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/8416428$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Wu, Dongyao</creatorcontrib><creatorcontrib>Sakr, Sherif</creatorcontrib><creatorcontrib>Zhu, Liming</creatorcontrib><creatorcontrib>Lee, Sung</creatorcontrib><creatorcontrib>Wu, Huijun</creatorcontrib><title>HDM-MC in-Action: A Framework for Big Data Analytics across Multiple Clusters</title><title>2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS)</title><addtitle>ICDSC</addtitle><description>Big data are increasingly collected and stored in a highly distributed infrastructures due to the development of several emerging technologies including sensor network, cloud computing, IoT and mobile computing among many other emerging technologies. In practice, the majority of existing big data processing frameworks (e.g., Hadoop, Spark, Flink) are designed based on the single-cluster setup with the assumptions of centralized management and homogeneous connectivity which makes them sub-optimal and sometimes infeasible to be applied for scenarios that require implementing data analytics jobs on highly distributed data sets (across racks, data centers or multi organizations). We demonstrate HDM-MC, a big data processing framework that is designed to enable the capability of performing large scale data analytics across multi-clusters with minimum extra overhead due to additional scheduling requirements. We describe the architecture and realization of the system using a step-by-step example scenario.</description><subject>Big Data</subject><subject>Data centers</subject><subject>Distributed databases</subject><subject>Distributed Systems</subject><subject>Organizations</subject><subject>Planning</subject><subject>Scheduling</subject><subject>Workflows</subject><issn>2575-8411</issn><isbn>1538668718</isbn><isbn>9781538668719</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2018</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><recordid>eNotj09LwzAchqMgOOfugpd8gdb88r_earu5wYoH9TzSNpFo144kQ_btnejpPTzw8LwI3QHJAUjxsKnq6jWnBHROCEhxgW5AMC2lVqAv0YwKJTLNAa7RIsZPQgiVmhMlZqhZ103WVNiPWdklP42PuMSrYPb2ewpf2E0BP_kPXJtkcDma4ZR8F7HpwhQjbo5D8ofB4mo4xmRDvEVXzgzRLv53jt5Xy7dqnW1fnjdVuc08KJEyxSSXrOWu0KSn52RmoHeEGkFNoTpqqQOtWq50axl3raPQiu4XFZ1oe87m6P7P6621u0PwexNOu_NDyalmP6fsTE4</recordid><startdate>201807</startdate><enddate>201807</enddate><creator>Wu, Dongyao</creator><creator>Sakr, Sherif</creator><creator>Zhu, Liming</creator><creator>Lee, Sung</creator><creator>Wu, Huijun</creator><general>IEEE</general><scope>6IE</scope><scope>6IH</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIO</scope></search><sort><creationdate>201807</creationdate><title>HDM-MC in-Action: A Framework for Big Data Analytics across Multiple Clusters</title><author>Wu, Dongyao ; Sakr, Sherif ; Zhu, Liming ; Lee, Sung ; Wu, Huijun</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i175t-736463b4f980d20013a1df02a52a97c2e2f187b478be34fbf21b5ca97c9c5bd43</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2018</creationdate><topic>Big Data</topic><topic>Data centers</topic><topic>Distributed databases</topic><topic>Distributed Systems</topic><topic>Organizations</topic><topic>Planning</topic><topic>Scheduling</topic><topic>Workflows</topic><toplevel>online_resources</toplevel><creatorcontrib>Wu, Dongyao</creatorcontrib><creatorcontrib>Sakr, Sherif</creatorcontrib><creatorcontrib>Zhu, Liming</creatorcontrib><creatorcontrib>Lee, Sung</creatorcontrib><creatorcontrib>Wu, Huijun</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan (POP) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Xplore</collection><collection>IEEE Proceedings Order Plans (POP) 1998-present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Wu, Dongyao</au><au>Sakr, Sherif</au><au>Zhu, Liming</au><au>Lee, Sung</au><au>Wu, Huijun</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>HDM-MC in-Action: A Framework for Big Data Analytics across Multiple Clusters</atitle><btitle>2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS)</btitle><stitle>ICDSC</stitle><date>2018-07</date><risdate>2018</risdate><spage>1547</spage><epage>1550</epage><pages>1547-1550</pages><eissn>2575-8411</eissn><eisbn>1538668718</eisbn><eisbn>9781538668719</eisbn><coden>IEEPAD</coden><abstract>Big data are increasingly collected and stored in a highly distributed infrastructures due to the development of several emerging technologies including sensor network, cloud computing, IoT and mobile computing among many other emerging technologies. In practice, the majority of existing big data processing frameworks (e.g., Hadoop, Spark, Flink) are designed based on the single-cluster setup with the assumptions of centralized management and homogeneous connectivity which makes them sub-optimal and sometimes infeasible to be applied for scenarios that require implementing data analytics jobs on highly distributed data sets (across racks, data centers or multi organizations). We demonstrate HDM-MC, a big data processing framework that is designed to enable the capability of performing large scale data analytics across multi-clusters with minimum extra overhead due to additional scheduling requirements. We describe the architecture and realization of the system using a step-by-step example scenario.</abstract><pub>IEEE</pub><doi>10.1109/ICDCS.2018.00165</doi><tpages>4</tpages></addata></record>
fulltext	fulltext_linktorsrc
identifier	EISSN: 2575-8411
ispartof	2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS), 2018, p.1547-1550
issn	2575-8411
language	eng
recordid	cdi_ieee_primary_8416428
source	IEEE Xplore All Conference Series
subjects	Big Data Data centers Distributed databases Distributed Systems Organizations Planning Scheduling Workflows
title	HDM-MC in-Action: A Framework for Big Data Analytics across Multiple Clusters
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-28T15%3A49%3A35IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_CHZPO&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=HDM-MC%20in-Action:%20A%20Framework%20for%20Big%20Data%20Analytics%20across%20Multiple%20Clusters&rft.btitle=2018%20IEEE%2038th%20International%20Conference%20on%20Distributed%20Computing%20Systems%20(ICDCS)&rft.au=Wu,%20Dongyao&rft.date=2018-07&rft.spage=1547&rft.epage=1550&rft.pages=1547-1550&rft.eissn=2575-8411&rft.coden=IEEPAD&rft_id=info:doi/10.1109/ICDCS.2018.00165&rft.eisbn=1538668718&rft.eisbn_list=9781538668719&rft_dat=%3Cieee_CHZPO%3E8416428%3C/ieee_CHZPO%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-i175t-736463b4f980d20013a1df02a52a97c2e2f187b478be34fbf21b5ca97c9c5bd43%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=8416428&rfr_iscdi=true