Loading…

Cost Minimization for Big Data Processing in Geo-Distributed Data Centers

The explosive growth of demands on big data processing imposes a heavy burden on computation, storage, and communication in data centers, which hence incurs considerable operational expenditure to data center providers. Therefore, cost minimization has become an emergent issue for the upcoming big d...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on emerging topics in computing 2014-09, Vol.2 (3), p.314-323
Main Authors: Gu, Lin, Zeng, Deze, Li, Peng, Guo, Song
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c374t-4cfd9b0f8f3fcf4691308fddec6ec67c874886ed62e0d676fccbcd614a3f0b103
cites cdi_FETCH-LOGICAL-c374t-4cfd9b0f8f3fcf4691308fddec6ec67c874886ed62e0d676fccbcd614a3f0b103
container_end_page 323
container_issue 3
container_start_page 314
container_title IEEE transactions on emerging topics in computing
container_volume 2
creator Gu, Lin
Zeng, Deze
Li, Peng
Guo, Song
description The explosive growth of demands on big data processing imposes a heavy burden on computation, storage, and communication in data centers, which hence incurs considerable operational expenditure to data center providers. Therefore, cost minimization has become an emergent issue for the upcoming big data era. Different from conventional cloud services, one of the main features of big data services is the tight coupling between data and computation as computation tasks can be conducted only when the corresponding data are available. As a result, three factors, i.e., task assignment, data placement, and data movement, deeply influence the operational expenditure of data centers. In this paper, we are motivated to study the cost minimization problem via a joint optimization of these three factors for big data services in geo-distributed data centers. To describe the task completion time with the consideration of both data transmission and computation, we propose a 2-D Markov chain and derive the average task completion time in closed-form. Furthermore, we model the problem as a mixed-integer nonlinear programming and propose an efficient solution to linearize it. The high efficiency of our proposal is validated by extensive simulation-based studies.
doi_str_mv 10.1109/TETC.2014.2310456
format article
fullrecord <record><control><sourceid>crossref_ieee_</sourceid><recordid>TN_cdi_crossref_primary_10_1109_TETC_2014_2310456</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>6762920</ieee_id><sourcerecordid>10_1109_TETC_2014_2310456</sourcerecordid><originalsourceid>FETCH-LOGICAL-c374t-4cfd9b0f8f3fcf4691308fddec6ec67c874886ed62e0d676fccbcd614a3f0b103</originalsourceid><addsrcrecordid>eNpNkEFLAzEQhYMoWGp_gHjJH9h1skmT7FG3WgsVPdTzkk0mJWJ3JYkH_fXu0iIOAzOH9x6Pj5BrBiVjUN_uHnZNWQETZcUZiKU8I7OKSV1ItYTzf_8lWaT0DuNoJmupZmTTDCnT59CHQ_gxOQw99UOk92FPVyYb-hoHiymFfk9DT9c4FKuQcgzdV0Z3lDTYZ4zpilx485Fwcbpz8vY49noqti_rTXO3LSxXIhfCeld34LXn3noha8ZBe-fQynGV1UpoLdHJCsFJJb21nXWSCcM9dAz4nLBjro1DShF9-xnDwcTvlkE74WgnHO2Eoz3hGD03R09AxD_9mF7VFfBf2XdcMg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Cost Minimization for Big Data Processing in Geo-Distributed Data Centers</title><source>IEEE Xplore Open Access Journals</source><source>IEEE Xplore (Online service)</source><creator>Gu, Lin ; Zeng, Deze ; Li, Peng ; Guo, Song</creator><creatorcontrib>Gu, Lin ; Zeng, Deze ; Li, Peng ; Guo, Song</creatorcontrib><description>The explosive growth of demands on big data processing imposes a heavy burden on computation, storage, and communication in data centers, which hence incurs considerable operational expenditure to data center providers. Therefore, cost minimization has become an emergent issue for the upcoming big data era. Different from conventional cloud services, one of the main features of big data services is the tight coupling between data and computation as computation tasks can be conducted only when the corresponding data are available. As a result, three factors, i.e., task assignment, data placement, and data movement, deeply influence the operational expenditure of data centers. In this paper, we are motivated to study the cost minimization problem via a joint optimization of these three factors for big data services in geo-distributed data centers. To describe the task completion time with the consideration of both data transmission and computation, we propose a 2-D Markov chain and derive the average task completion time in closed-form. Furthermore, we model the problem as a mixed-integer nonlinear programming and propose an efficient solution to linearize it. The high efficiency of our proposal is validated by extensive simulation-based studies.</description><identifier>ISSN: 2168-6750</identifier><identifier>EISSN: 2168-6750</identifier><identifier>DOI: 10.1109/TETC.2014.2310456</identifier><identifier>CODEN: ITETBT</identifier><language>eng</language><publisher>IEEE</publisher><subject>Big data ; Data handling ; Data storage systems ; Distributed databases ; Information management ; Minimization ; Routing protocols</subject><ispartof>IEEE transactions on emerging topics in computing, 2014-09, Vol.2 (3), p.314-323</ispartof><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c374t-4cfd9b0f8f3fcf4691308fddec6ec67c874886ed62e0d676fccbcd614a3f0b103</citedby><cites>FETCH-LOGICAL-c374t-4cfd9b0f8f3fcf4691308fddec6ec67c874886ed62e0d676fccbcd614a3f0b103</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/6762920$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>314,776,780,27612,27903,27904,54775,54912</link.rule.ids></links><search><creatorcontrib>Gu, Lin</creatorcontrib><creatorcontrib>Zeng, Deze</creatorcontrib><creatorcontrib>Li, Peng</creatorcontrib><creatorcontrib>Guo, Song</creatorcontrib><title>Cost Minimization for Big Data Processing in Geo-Distributed Data Centers</title><title>IEEE transactions on emerging topics in computing</title><addtitle>TETC</addtitle><description>The explosive growth of demands on big data processing imposes a heavy burden on computation, storage, and communication in data centers, which hence incurs considerable operational expenditure to data center providers. Therefore, cost minimization has become an emergent issue for the upcoming big data era. Different from conventional cloud services, one of the main features of big data services is the tight coupling between data and computation as computation tasks can be conducted only when the corresponding data are available. As a result, three factors, i.e., task assignment, data placement, and data movement, deeply influence the operational expenditure of data centers. In this paper, we are motivated to study the cost minimization problem via a joint optimization of these three factors for big data services in geo-distributed data centers. To describe the task completion time with the consideration of both data transmission and computation, we propose a 2-D Markov chain and derive the average task completion time in closed-form. Furthermore, we model the problem as a mixed-integer nonlinear programming and propose an efficient solution to linearize it. The high efficiency of our proposal is validated by extensive simulation-based studies.</description><subject>Big data</subject><subject>Data handling</subject><subject>Data storage systems</subject><subject>Distributed databases</subject><subject>Information management</subject><subject>Minimization</subject><subject>Routing protocols</subject><issn>2168-6750</issn><issn>2168-6750</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2014</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><recordid>eNpNkEFLAzEQhYMoWGp_gHjJH9h1skmT7FG3WgsVPdTzkk0mJWJ3JYkH_fXu0iIOAzOH9x6Pj5BrBiVjUN_uHnZNWQETZcUZiKU8I7OKSV1ItYTzf_8lWaT0DuNoJmupZmTTDCnT59CHQ_gxOQw99UOk92FPVyYb-hoHiymFfk9DT9c4FKuQcgzdV0Z3lDTYZ4zpilx485Fwcbpz8vY49noqti_rTXO3LSxXIhfCeld34LXn3noha8ZBe-fQynGV1UpoLdHJCsFJJb21nXWSCcM9dAz4nLBjro1DShF9-xnDwcTvlkE74WgnHO2Eoz3hGD03R09AxD_9mF7VFfBf2XdcMg</recordid><startdate>20140901</startdate><enddate>20140901</enddate><creator>Gu, Lin</creator><creator>Zeng, Deze</creator><creator>Li, Peng</creator><creator>Guo, Song</creator><general>IEEE</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>20140901</creationdate><title>Cost Minimization for Big Data Processing in Geo-Distributed Data Centers</title><author>Gu, Lin ; Zeng, Deze ; Li, Peng ; Guo, Song</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c374t-4cfd9b0f8f3fcf4691308fddec6ec67c874886ed62e0d676fccbcd614a3f0b103</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2014</creationdate><topic>Big data</topic><topic>Data handling</topic><topic>Data storage systems</topic><topic>Distributed databases</topic><topic>Information management</topic><topic>Minimization</topic><topic>Routing protocols</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Gu, Lin</creatorcontrib><creatorcontrib>Zeng, Deze</creatorcontrib><creatorcontrib>Li, Peng</creatorcontrib><creatorcontrib>Guo, Song</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Xplore Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE/IET Electronic Library</collection><collection>CrossRef</collection><jtitle>IEEE transactions on emerging topics in computing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Gu, Lin</au><au>Zeng, Deze</au><au>Li, Peng</au><au>Guo, Song</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Cost Minimization for Big Data Processing in Geo-Distributed Data Centers</atitle><jtitle>IEEE transactions on emerging topics in computing</jtitle><stitle>TETC</stitle><date>2014-09-01</date><risdate>2014</risdate><volume>2</volume><issue>3</issue><spage>314</spage><epage>323</epage><pages>314-323</pages><issn>2168-6750</issn><eissn>2168-6750</eissn><coden>ITETBT</coden><abstract>The explosive growth of demands on big data processing imposes a heavy burden on computation, storage, and communication in data centers, which hence incurs considerable operational expenditure to data center providers. Therefore, cost minimization has become an emergent issue for the upcoming big data era. Different from conventional cloud services, one of the main features of big data services is the tight coupling between data and computation as computation tasks can be conducted only when the corresponding data are available. As a result, three factors, i.e., task assignment, data placement, and data movement, deeply influence the operational expenditure of data centers. In this paper, we are motivated to study the cost minimization problem via a joint optimization of these three factors for big data services in geo-distributed data centers. To describe the task completion time with the consideration of both data transmission and computation, we propose a 2-D Markov chain and derive the average task completion time in closed-form. Furthermore, we model the problem as a mixed-integer nonlinear programming and propose an efficient solution to linearize it. The high efficiency of our proposal is validated by extensive simulation-based studies.</abstract><pub>IEEE</pub><doi>10.1109/TETC.2014.2310456</doi><tpages>10</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2168-6750
ispartof IEEE transactions on emerging topics in computing, 2014-09, Vol.2 (3), p.314-323
issn 2168-6750
2168-6750
language eng
recordid cdi_crossref_primary_10_1109_TETC_2014_2310456
source IEEE Xplore Open Access Journals; IEEE Xplore (Online service)
subjects Big data
Data handling
Data storage systems
Distributed databases
Information management
Minimization
Routing protocols
title Cost Minimization for Big Data Processing in Geo-Distributed Data Centers
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-21T20%3A01%3A19IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref_ieee_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Cost%20Minimization%20for%20Big%20Data%20Processing%20in%20Geo-Distributed%20Data%20Centers&rft.jtitle=IEEE%20transactions%20on%20emerging%20topics%20in%20computing&rft.au=Gu,%20Lin&rft.date=2014-09-01&rft.volume=2&rft.issue=3&rft.spage=314&rft.epage=323&rft.pages=314-323&rft.issn=2168-6750&rft.eissn=2168-6750&rft.coden=ITETBT&rft_id=info:doi/10.1109/TETC.2014.2310456&rft_dat=%3Ccrossref_ieee_%3E10_1109_TETC_2014_2310456%3C/crossref_ieee_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c374t-4cfd9b0f8f3fcf4691308fddec6ec67c874886ed62e0d676fccbcd614a3f0b103%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=6762920&rfr_iscdi=true