Loading…

H2Hadoop: Improving Hadoop Performance Using the Metadata of Related Jobs

Cloud Computing leverages Hadoop framework for processing BigData in parallel. Hadoop has certain limitations that could be exploited to execute the job efficiently. These limitations are mostly because of data locality in the cluster, jobs and tasks scheduling, and resource allocations in Hadoop. E...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on cloud computing 2018-10, Vol.6 (4), p.1031-1040
Main Authors: Alshammari, Hamoud, Lee, Jeongkyu, Bajwa, Hassan
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c291t-94da5c2ded94618f2fbfc93a604ce9bb05cbc63301fa7f0f780d4890eff740803
cites cdi_FETCH-LOGICAL-c291t-94da5c2ded94618f2fbfc93a604ce9bb05cbc63301fa7f0f780d4890eff740803
container_end_page 1040
container_issue 4
container_start_page 1031
container_title IEEE transactions on cloud computing
container_volume 6
creator Alshammari, Hamoud
Lee, Jeongkyu
Bajwa, Hassan
description Cloud Computing leverages Hadoop framework for processing BigData in parallel. Hadoop has certain limitations that could be exploited to execute the job efficiently. These limitations are mostly because of data locality in the cluster, jobs and tasks scheduling, and resource allocations in Hadoop. Efficient resource allocation remains a challenge in Cloud Computing MapReduce platforms. We propose H2Hadoop, which is an enhanced Hadoop architecture that reduces the computation cost associated with BigData analysis. The proposed architecture also addresses the issue of resource allocation in native Hadoop. H2Hadoop provides a better solution for "text data", such as finding DNA sequence and the motif of a DNA sequence. Also, H2Hadoop provides an efficient Data Mining approach for Cloud Computing environments. H2Hadoop architecture leverages on NameNode's ability to assign jobs to the TaskTrakers (DataNodes) within the cluster. By adding control features to the NameNode, H2Hadoop can intelligently direct and assign tasks to the DataNodes that contain the required data without sending the job to the whole cluster. Comparing with native Hadoop, H2Hadoop reduces CPU time, number of read operations, and another Hadoop factors.
doi_str_mv 10.1109/TCC.2016.2535261
format article
fullrecord <record><control><sourceid>proquest_ieee_</sourceid><recordid>TN_cdi_proquest_journals_2151464050</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>7420665</ieee_id><sourcerecordid>2151464050</sourcerecordid><originalsourceid>FETCH-LOGICAL-c291t-94da5c2ded94618f2fbfc93a604ce9bb05cbc63301fa7f0f780d4890eff740803</originalsourceid><addsrcrecordid>eNpNkNFLwzAQxoMoOObeBV8CPnfepUna-CZF3WSiyPYc0jTRjW2ZSRX8723pEO_ljrvvuzt-hFwiTBFB3SyrasoA5ZSJXDCJJ2TEUJZZgRJP_9XnZJLSBrooBSpUIzKfsZlpQjjc0vnuEMP3ev9Ohw59ddGHuDN76-gq9YP2w9Fn15rGtIYGT9_c1rSuoU-hThfkzJttcpNjHpPVw_2ymmWLl8d5dbfILFPYZoo3RljWuEZxiaVnvvZW5UYCt07VNQhbW5nngN4UHnxRQsNLBc77gkMJ-ZhcD3u7bz-_XGr1JnzFfXdSMxTIJQfRq2BQ2RhSis7rQ1zvTPzRCLpnpjtmumemj8w6y9VgWTvn_uQFZyClyH8BCDdmNw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2151464050</pqid></control><display><type>article</type><title>H2Hadoop: Improving Hadoop Performance Using the Metadata of Related Jobs</title><source>IEEE Electronic Library (IEL) Journals</source><creator>Alshammari, Hamoud ; Lee, Jeongkyu ; Bajwa, Hassan</creator><creatorcontrib>Alshammari, Hamoud ; Lee, Jeongkyu ; Bajwa, Hassan</creatorcontrib><description>Cloud Computing leverages Hadoop framework for processing BigData in parallel. Hadoop has certain limitations that could be exploited to execute the job efficiently. These limitations are mostly because of data locality in the cluster, jobs and tasks scheduling, and resource allocations in Hadoop. Efficient resource allocation remains a challenge in Cloud Computing MapReduce platforms. We propose H2Hadoop, which is an enhanced Hadoop architecture that reduces the computation cost associated with BigData analysis. The proposed architecture also addresses the issue of resource allocation in native Hadoop. H2Hadoop provides a better solution for "text data", such as finding DNA sequence and the motif of a DNA sequence. Also, H2Hadoop provides an efficient Data Mining approach for Cloud Computing environments. H2Hadoop architecture leverages on NameNode's ability to assign jobs to the TaskTrakers (DataNodes) within the cluster. By adding control features to the NameNode, H2Hadoop can intelligently direct and assign tasks to the DataNodes that contain the required data without sending the job to the whole cluster. Comparing with native Hadoop, H2Hadoop reduces CPU time, number of read operations, and another Hadoop factors.</description><identifier>ISSN: 2168-7161</identifier><identifier>EISSN: 2168-7161</identifier><identifier>EISSN: 2372-0018</identifier><identifier>DOI: 10.1109/TCC.2016.2535261</identifier><identifier>CODEN: ITCCF6</identifier><language>eng</language><publisher>Piscataway: IEEE Computer Society</publisher><subject>BigData ; Cloud computing ; Clusters ; Computational efficiency ; Computer architecture ; Cost analysis ; Data mining ; Deoxyribonucleic acid ; DNA ; File systems ; H2Hadoop ; Hadoop ; Hadoop Performance ; MapReduce ; Metadata ; Resource allocation ; Resource management ; Resource scheduling ; Task scheduling ; Text Data ; Text mining</subject><ispartof>IEEE transactions on cloud computing, 2018-10, Vol.6 (4), p.1031-1040</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2018</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c291t-94da5c2ded94618f2fbfc93a604ce9bb05cbc63301fa7f0f780d4890eff740803</citedby><cites>FETCH-LOGICAL-c291t-94da5c2ded94618f2fbfc93a604ce9bb05cbc63301fa7f0f780d4890eff740803</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/7420665$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,27922,27923,54794</link.rule.ids></links><search><creatorcontrib>Alshammari, Hamoud</creatorcontrib><creatorcontrib>Lee, Jeongkyu</creatorcontrib><creatorcontrib>Bajwa, Hassan</creatorcontrib><title>H2Hadoop: Improving Hadoop Performance Using the Metadata of Related Jobs</title><title>IEEE transactions on cloud computing</title><addtitle>TCC</addtitle><description>Cloud Computing leverages Hadoop framework for processing BigData in parallel. Hadoop has certain limitations that could be exploited to execute the job efficiently. These limitations are mostly because of data locality in the cluster, jobs and tasks scheduling, and resource allocations in Hadoop. Efficient resource allocation remains a challenge in Cloud Computing MapReduce platforms. We propose H2Hadoop, which is an enhanced Hadoop architecture that reduces the computation cost associated with BigData analysis. The proposed architecture also addresses the issue of resource allocation in native Hadoop. H2Hadoop provides a better solution for "text data", such as finding DNA sequence and the motif of a DNA sequence. Also, H2Hadoop provides an efficient Data Mining approach for Cloud Computing environments. H2Hadoop architecture leverages on NameNode's ability to assign jobs to the TaskTrakers (DataNodes) within the cluster. By adding control features to the NameNode, H2Hadoop can intelligently direct and assign tasks to the DataNodes that contain the required data without sending the job to the whole cluster. Comparing with native Hadoop, H2Hadoop reduces CPU time, number of read operations, and another Hadoop factors.</description><subject>BigData</subject><subject>Cloud computing</subject><subject>Clusters</subject><subject>Computational efficiency</subject><subject>Computer architecture</subject><subject>Cost analysis</subject><subject>Data mining</subject><subject>Deoxyribonucleic acid</subject><subject>DNA</subject><subject>File systems</subject><subject>H2Hadoop</subject><subject>Hadoop</subject><subject>Hadoop Performance</subject><subject>MapReduce</subject><subject>Metadata</subject><subject>Resource allocation</subject><subject>Resource management</subject><subject>Resource scheduling</subject><subject>Task scheduling</subject><subject>Text Data</subject><subject>Text mining</subject><issn>2168-7161</issn><issn>2168-7161</issn><issn>2372-0018</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><recordid>eNpNkNFLwzAQxoMoOObeBV8CPnfepUna-CZF3WSiyPYc0jTRjW2ZSRX8723pEO_ljrvvuzt-hFwiTBFB3SyrasoA5ZSJXDCJJ2TEUJZZgRJP_9XnZJLSBrooBSpUIzKfsZlpQjjc0vnuEMP3ev9Ohw59ddGHuDN76-gq9YP2w9Fn15rGtIYGT9_c1rSuoU-hThfkzJttcpNjHpPVw_2ymmWLl8d5dbfILFPYZoo3RljWuEZxiaVnvvZW5UYCt07VNQhbW5nngN4UHnxRQsNLBc77gkMJ-ZhcD3u7bz-_XGr1JnzFfXdSMxTIJQfRq2BQ2RhSis7rQ1zvTPzRCLpnpjtmumemj8w6y9VgWTvn_uQFZyClyH8BCDdmNw</recordid><startdate>20181001</startdate><enddate>20181001</enddate><creator>Alshammari, Hamoud</creator><creator>Lee, Jeongkyu</creator><creator>Bajwa, Hassan</creator><general>IEEE Computer Society</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20181001</creationdate><title>H2Hadoop: Improving Hadoop Performance Using the Metadata of Related Jobs</title><author>Alshammari, Hamoud ; Lee, Jeongkyu ; Bajwa, Hassan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c291t-94da5c2ded94618f2fbfc93a604ce9bb05cbc63301fa7f0f780d4890eff740803</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2018</creationdate><topic>BigData</topic><topic>Cloud computing</topic><topic>Clusters</topic><topic>Computational efficiency</topic><topic>Computer architecture</topic><topic>Cost analysis</topic><topic>Data mining</topic><topic>Deoxyribonucleic acid</topic><topic>DNA</topic><topic>File systems</topic><topic>H2Hadoop</topic><topic>Hadoop</topic><topic>Hadoop Performance</topic><topic>MapReduce</topic><topic>Metadata</topic><topic>Resource allocation</topic><topic>Resource management</topic><topic>Resource scheduling</topic><topic>Task scheduling</topic><topic>Text Data</topic><topic>Text mining</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Alshammari, Hamoud</creatorcontrib><creatorcontrib>Lee, Jeongkyu</creatorcontrib><creatorcontrib>Bajwa, Hassan</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Xplore</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on cloud computing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Alshammari, Hamoud</au><au>Lee, Jeongkyu</au><au>Bajwa, Hassan</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>H2Hadoop: Improving Hadoop Performance Using the Metadata of Related Jobs</atitle><jtitle>IEEE transactions on cloud computing</jtitle><stitle>TCC</stitle><date>2018-10-01</date><risdate>2018</risdate><volume>6</volume><issue>4</issue><spage>1031</spage><epage>1040</epage><pages>1031-1040</pages><issn>2168-7161</issn><eissn>2168-7161</eissn><eissn>2372-0018</eissn><coden>ITCCF6</coden><abstract>Cloud Computing leverages Hadoop framework for processing BigData in parallel. Hadoop has certain limitations that could be exploited to execute the job efficiently. These limitations are mostly because of data locality in the cluster, jobs and tasks scheduling, and resource allocations in Hadoop. Efficient resource allocation remains a challenge in Cloud Computing MapReduce platforms. We propose H2Hadoop, which is an enhanced Hadoop architecture that reduces the computation cost associated with BigData analysis. The proposed architecture also addresses the issue of resource allocation in native Hadoop. H2Hadoop provides a better solution for "text data", such as finding DNA sequence and the motif of a DNA sequence. Also, H2Hadoop provides an efficient Data Mining approach for Cloud Computing environments. H2Hadoop architecture leverages on NameNode's ability to assign jobs to the TaskTrakers (DataNodes) within the cluster. By adding control features to the NameNode, H2Hadoop can intelligently direct and assign tasks to the DataNodes that contain the required data without sending the job to the whole cluster. Comparing with native Hadoop, H2Hadoop reduces CPU time, number of read operations, and another Hadoop factors.</abstract><cop>Piscataway</cop><pub>IEEE Computer Society</pub><doi>10.1109/TCC.2016.2535261</doi><tpages>10</tpages></addata></record>
fulltext fulltext
identifier ISSN: 2168-7161
ispartof IEEE transactions on cloud computing, 2018-10, Vol.6 (4), p.1031-1040
issn 2168-7161
2168-7161
2372-0018
language eng
recordid cdi_proquest_journals_2151464050
source IEEE Electronic Library (IEL) Journals
subjects BigData
Cloud computing
Clusters
Computational efficiency
Computer architecture
Cost analysis
Data mining
Deoxyribonucleic acid
DNA
File systems
H2Hadoop
Hadoop
Hadoop Performance
MapReduce
Metadata
Resource allocation
Resource management
Resource scheduling
Task scheduling
Text Data
Text mining
title H2Hadoop: Improving Hadoop Performance Using the Metadata of Related Jobs
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-13T13%3A42%3A23IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_ieee_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=H2Hadoop:%20Improving%20Hadoop%20Performance%20Using%20the%20Metadata%20of%20Related%20Jobs&rft.jtitle=IEEE%20transactions%20on%20cloud%20computing&rft.au=Alshammari,%20Hamoud&rft.date=2018-10-01&rft.volume=6&rft.issue=4&rft.spage=1031&rft.epage=1040&rft.pages=1031-1040&rft.issn=2168-7161&rft.eissn=2168-7161&rft.coden=ITCCF6&rft_id=info:doi/10.1109/TCC.2016.2535261&rft_dat=%3Cproquest_ieee_%3E2151464050%3C/proquest_ieee_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c291t-94da5c2ded94618f2fbfc93a604ce9bb05cbc63301fa7f0f780d4890eff740803%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2151464050&rft_id=info:pmid/&rft_ieee_id=7420665&rfr_iscdi=true