Loading…

Implications of data placement strategy to Big Data technologies based on shared-nothing architecture for geosciences

It is found that data placement on the networked nodes of a cluster based on the shared-nothing architecture (SNA) should align in the physical (i.e. spatiotemporal) space for most geoscience Big Data analysis systems in order to minimize data movements and thus achieve optimal performance and effic...

Full description

Saved in:
Bibliographic Details
Main Authors: Kwo-Sen Kuo, Oloso, Amidu, Khoa Doan, Clune, Thomas L., Hongfeng Yu
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites
container_end_page 7607
container_issue
container_start_page 7605
container_title
container_volume
creator Kwo-Sen Kuo
Oloso, Amidu
Khoa Doan
Clune, Thomas L.
Hongfeng Yu
description It is found that data placement on the networked nodes of a cluster based on the shared-nothing architecture (SNA) should align in the physical (i.e. spatiotemporal) space for most geoscience Big Data analysis systems in order to minimize data movements and thus achieve optimal performance and efficiency. This is due to the fact that data analysis in geosciences predominantly requires spatiotemporal coincidence. If individual datasets are considered separately in their placement on the cluster nodes, these systems often have to move data between nodes when an analysis involves two or more datasets. In this paper, we first report our discoveries from a data placement alignment experiment with two Big Data technologies, SciDB and Spark+HDFS, and then elucidate some of the far-reaching implications of this discovery.
doi_str_mv 10.1109/IGARSS.2016.7730983
format conference_proceeding
fullrecord <record><control><sourceid>ieee_CHZPO</sourceid><recordid>TN_cdi_ieee_primary_7730983</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>7730983</ieee_id><sourcerecordid>7730983</sourcerecordid><originalsourceid>FETCH-LOGICAL-i175t-d9f6d3be39c4738232cacad668405540ad7a91af64076d622dbdc8c752c342e13</originalsourceid><addsrcrecordid>eNotkM9KAzEYxKMg2FafoJe8wNb82U12j7VqXSgIVs_la_LtbmSblCQ99O2t2NMMzI9hGELmnC04Z81Tu15-brcLwbhaaC1ZU8sbMuUVa5iUUuhbMhG8koVmTN6TaUo_F1MLxibk1B6OozOQXfCJho5ayECPIxg8oM805QgZ-zPNgT67nr78xRnN4MMYeoeJ7iGhpcHTNEBEW_iQB-d7CtEM7kLmU0TahUh7DMk49AbTA7nrYEz4eNUZ-X57_Vq9F5uPdbtabgrHdZUL23TKyj3KxpT6MlgKAwasUnXJqqpkYDU0HDpVMq2sEsLuramNroSRpUAuZ2T-3-sQcXeM7gDxvLteJH8BKjpdkg</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Implications of data placement strategy to Big Data technologies based on shared-nothing architecture for geosciences</title><source>IEEE Xplore All Conference Series</source><creator>Kwo-Sen Kuo ; Oloso, Amidu ; Khoa Doan ; Clune, Thomas L. ; Hongfeng Yu</creator><creatorcontrib>Kwo-Sen Kuo ; Oloso, Amidu ; Khoa Doan ; Clune, Thomas L. ; Hongfeng Yu</creatorcontrib><description>It is found that data placement on the networked nodes of a cluster based on the shared-nothing architecture (SNA) should align in the physical (i.e. spatiotemporal) space for most geoscience Big Data analysis systems in order to minimize data movements and thus achieve optimal performance and efficiency. This is due to the fact that data analysis in geosciences predominantly requires spatiotemporal coincidence. If individual datasets are considered separately in their placement on the cluster nodes, these systems often have to move data between nodes when an analysis involves two or more datasets. In this paper, we first report our discoveries from a data placement alignment experiment with two Big Data technologies, SciDB and Spark+HDFS, and then elucidate some of the far-reaching implications of this discovery.</description><identifier>EISSN: 2153-7003</identifier><identifier>EISBN: 1509033327</identifier><identifier>EISBN: 9781509033324</identifier><identifier>DOI: 10.1109/IGARSS.2016.7730983</identifier><language>eng</language><publisher>IEEE</publisher><subject>Arrays ; Big data ; data placement ; Geology ; geoscience ; Shape ; shared-nothing architecture ; Spatiotemporal phenomena ; Temperature distribution</subject><ispartof>2016 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), 2016, p.7605-7607</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/7730983$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,23930,23931,25140,27925,54555,54932</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/7730983$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Kwo-Sen Kuo</creatorcontrib><creatorcontrib>Oloso, Amidu</creatorcontrib><creatorcontrib>Khoa Doan</creatorcontrib><creatorcontrib>Clune, Thomas L.</creatorcontrib><creatorcontrib>Hongfeng Yu</creatorcontrib><title>Implications of data placement strategy to Big Data technologies based on shared-nothing architecture for geosciences</title><title>2016 IEEE International Geoscience and Remote Sensing Symposium (IGARSS)</title><addtitle>IGARSS</addtitle><description>It is found that data placement on the networked nodes of a cluster based on the shared-nothing architecture (SNA) should align in the physical (i.e. spatiotemporal) space for most geoscience Big Data analysis systems in order to minimize data movements and thus achieve optimal performance and efficiency. This is due to the fact that data analysis in geosciences predominantly requires spatiotemporal coincidence. If individual datasets are considered separately in their placement on the cluster nodes, these systems often have to move data between nodes when an analysis involves two or more datasets. In this paper, we first report our discoveries from a data placement alignment experiment with two Big Data technologies, SciDB and Spark+HDFS, and then elucidate some of the far-reaching implications of this discovery.</description><subject>Arrays</subject><subject>Big data</subject><subject>data placement</subject><subject>Geology</subject><subject>geoscience</subject><subject>Shape</subject><subject>shared-nothing architecture</subject><subject>Spatiotemporal phenomena</subject><subject>Temperature distribution</subject><issn>2153-7003</issn><isbn>1509033327</isbn><isbn>9781509033324</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2016</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><recordid>eNotkM9KAzEYxKMg2FafoJe8wNb82U12j7VqXSgIVs_la_LtbmSblCQ99O2t2NMMzI9hGELmnC04Z81Tu15-brcLwbhaaC1ZU8sbMuUVa5iUUuhbMhG8koVmTN6TaUo_F1MLxibk1B6OozOQXfCJho5ayECPIxg8oM805QgZ-zPNgT67nr78xRnN4MMYeoeJ7iGhpcHTNEBEW_iQB-d7CtEM7kLmU0TahUh7DMk49AbTA7nrYEz4eNUZ-X57_Vq9F5uPdbtabgrHdZUL23TKyj3KxpT6MlgKAwasUnXJqqpkYDU0HDpVMq2sEsLuramNroSRpUAuZ2T-3-sQcXeM7gDxvLteJH8BKjpdkg</recordid><startdate>201607</startdate><enddate>201607</enddate><creator>Kwo-Sen Kuo</creator><creator>Oloso, Amidu</creator><creator>Khoa Doan</creator><creator>Clune, Thomas L.</creator><creator>Hongfeng Yu</creator><general>IEEE</general><scope>6IE</scope><scope>6IH</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIO</scope></search><sort><creationdate>201607</creationdate><title>Implications of data placement strategy to Big Data technologies based on shared-nothing architecture for geosciences</title><author>Kwo-Sen Kuo ; Oloso, Amidu ; Khoa Doan ; Clune, Thomas L. ; Hongfeng Yu</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i175t-d9f6d3be39c4738232cacad668405540ad7a91af64076d622dbdc8c752c342e13</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2016</creationdate><topic>Arrays</topic><topic>Big data</topic><topic>data placement</topic><topic>Geology</topic><topic>geoscience</topic><topic>Shape</topic><topic>shared-nothing architecture</topic><topic>Spatiotemporal phenomena</topic><topic>Temperature distribution</topic><toplevel>online_resources</toplevel><creatorcontrib>Kwo-Sen Kuo</creatorcontrib><creatorcontrib>Oloso, Amidu</creatorcontrib><creatorcontrib>Khoa Doan</creatorcontrib><creatorcontrib>Clune, Thomas L.</creatorcontrib><creatorcontrib>Hongfeng Yu</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan (POP) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Xplore</collection><collection>IEEE Proceedings Order Plans (POP) 1998-present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Kwo-Sen Kuo</au><au>Oloso, Amidu</au><au>Khoa Doan</au><au>Clune, Thomas L.</au><au>Hongfeng Yu</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Implications of data placement strategy to Big Data technologies based on shared-nothing architecture for geosciences</atitle><btitle>2016 IEEE International Geoscience and Remote Sensing Symposium (IGARSS)</btitle><stitle>IGARSS</stitle><date>2016-07</date><risdate>2016</risdate><spage>7605</spage><epage>7607</epage><pages>7605-7607</pages><eissn>2153-7003</eissn><eisbn>1509033327</eisbn><eisbn>9781509033324</eisbn><abstract>It is found that data placement on the networked nodes of a cluster based on the shared-nothing architecture (SNA) should align in the physical (i.e. spatiotemporal) space for most geoscience Big Data analysis systems in order to minimize data movements and thus achieve optimal performance and efficiency. This is due to the fact that data analysis in geosciences predominantly requires spatiotemporal coincidence. If individual datasets are considered separately in their placement on the cluster nodes, these systems often have to move data between nodes when an analysis involves two or more datasets. In this paper, we first report our discoveries from a data placement alignment experiment with two Big Data technologies, SciDB and Spark+HDFS, and then elucidate some of the far-reaching implications of this discovery.</abstract><pub>IEEE</pub><doi>10.1109/IGARSS.2016.7730983</doi><tpages>3</tpages></addata></record>
fulltext fulltext_linktorsrc
identifier EISSN: 2153-7003
ispartof 2016 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), 2016, p.7605-7607
issn 2153-7003
language eng
recordid cdi_ieee_primary_7730983
source IEEE Xplore All Conference Series
subjects Arrays
Big data
data placement
Geology
geoscience
Shape
shared-nothing architecture
Spatiotemporal phenomena
Temperature distribution
title Implications of data placement strategy to Big Data technologies based on shared-nothing architecture for geosciences
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-01T11%3A21%3A46IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_CHZPO&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Implications%20of%20data%20placement%20strategy%20to%20Big%20Data%20technologies%20based%20on%20shared-nothing%20architecture%20for%20geosciences&rft.btitle=2016%20IEEE%20International%20Geoscience%20and%20Remote%20Sensing%20Symposium%20(IGARSS)&rft.au=Kwo-Sen%20Kuo&rft.date=2016-07&rft.spage=7605&rft.epage=7607&rft.pages=7605-7607&rft.eissn=2153-7003&rft_id=info:doi/10.1109/IGARSS.2016.7730983&rft.eisbn=1509033327&rft.eisbn_list=9781509033324&rft_dat=%3Cieee_CHZPO%3E7730983%3C/ieee_CHZPO%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-i175t-d9f6d3be39c4738232cacad668405540ad7a91af64076d622dbdc8c752c342e13%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=7730983&rfr_iscdi=true