Loading…

H2Hadoop: Improving Hadoop Performance Using the Metadata of Related Jobs

Cloud Computing leverages Hadoop framework for processing BigData in parallel. Hadoop has certain limitations that could be exploited to execute the job efficiently. These limitations are mostly because of data locality in the cluster, jobs and tasks scheduling, and resource allocations in Hadoop. E...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE transactions on cloud computing 2018-10, Vol.6 (4), p.1031-1040
Main Authors:	Alshammari, Hamoud, Lee, Jeongkyu, Bajwa, Hassan
Format:	Article
Language:	English
Subjects:	BigData Cloud computing Clusters Computational efficiency Computer architecture Cost analysis Data mining Deoxyribonucleic acid DNA File systems H2Hadoop Hadoop Hadoop Performance MapReduce Metadata Resource allocation Resource management Resource scheduling Task scheduling Text Data Text mining
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by	cdi_FETCH-LOGICAL-c291t-94da5c2ded94618f2fbfc93a604ce9bb05cbc63301fa7f0f780d4890eff740803
cites	cdi_FETCH-LOGICAL-c291t-94da5c2ded94618f2fbfc93a604ce9bb05cbc63301fa7f0f780d4890eff740803
container_end_page	1040
container_issue	4
container_start_page	1031
container_title	IEEE transactions on cloud computing
container_volume	6
creator	Alshammari, Hamoud Lee, Jeongkyu Bajwa, Hassan
description	Cloud Computing leverages Hadoop framework for processing BigData in parallel. Hadoop has certain limitations that could be exploited to execute the job efficiently. These limitations are mostly because of data locality in the cluster, jobs and tasks scheduling, and resource allocations in Hadoop. Efficient resource allocation remains a challenge in Cloud Computing MapReduce platforms. We propose H2Hadoop, which is an enhanced Hadoop architecture that reduces the computation cost associated with BigData analysis. The proposed architecture also addresses the issue of resource allocation in native Hadoop. H2Hadoop provides a better solution for "text data", such as finding DNA sequence and the motif of a DNA sequence. Also, H2Hadoop provides an efficient Data Mining approach for Cloud Computing environments. H2Hadoop architecture leverages on NameNode's ability to assign jobs to the TaskTrakers (DataNodes) within the cluster. By adding control features to the NameNode, H2Hadoop can intelligently direct and assign tasks to the DataNodes that contain the required data without sending the job to the whole cluster. Comparing with native Hadoop, H2Hadoop reduces CPU time, number of read operations, and another Hadoop factors.
doi_str_mv	10.1109/TCC.2016.2535261
format	article
fullrecord	<record><control><sourceid>proquest_ieee_</sourceid><recordid>TN_cdi_proquest_journals_2151464050</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>7420665</ieee_id><sourcerecordid>2151464050</sourcerecordid><originalsourceid>FETCH-LOGICAL-c291t-94da5c2ded94618f2fbfc93a604ce9bb05cbc63301fa7f0f780d4890eff740803</originalsourceid><addsrcrecordid>eNpNkNFLwzAQxoMoOObeBV8CPnfepUna-CZF3WSiyPYc0jTRjW2ZSRX8723pEO_ljrvvuzt-hFwiTBFB3SyrasoA5ZSJXDCJJ2TEUJZZgRJP_9XnZJLSBrooBSpUIzKfsZlpQjjc0vnuEMP3ev9Ohw59ddGHuDN76-gq9YP2w9Fn15rGtIYGT9_c1rSuoU-hThfkzJttcpNjHpPVw_2ymmWLl8d5dbfILFPYZoo3RljWuEZxiaVnvvZW5UYCt07VNQhbW5nngN4UHnxRQsNLBc77gkMJ-ZhcD3u7bz-_XGr1JnzFfXdSMxTIJQfRq2BQ2RhSis7rQ1zvTPzRCLpnpjtmumemj8w6y9VgWTvn_uQFZyClyH8BCDdmNw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2151464050</pqid></control><display><type>article</type><title>H2Hadoop: Improving Hadoop Performance Using the Metadata of Related Jobs</title><source>IEEE Electronic Library (IEL) Journals</source><creator>Alshammari, Hamoud ; Lee, Jeongkyu ; Bajwa, Hassan</creator><creatorcontrib>Alshammari, Hamoud ; Lee, Jeongkyu ; Bajwa, Hassan</creatorcontrib><description>Cloud Computing leverages Hadoop framework for processing BigData in parallel. Hadoop has certain limitations that could be exploited to execute the job efficiently. These limitations are mostly because of data locality in the cluster, jobs and tasks scheduling, and resource allocations in Hadoop. Efficient resource allocation remains a challenge in Cloud Computing MapReduce platforms. We propose H2Hadoop, which is an enhanced Hadoop architecture that reduces the computation cost associated with BigData analysis. The proposed architecture also addresses the issue of resource allocation in native Hadoop. H2Hadoop provides a better solution for "text data", such as finding DNA sequence and the motif of a DNA sequence. Also, H2Hadoop provides an efficient Data Mining approach for Cloud Computing environments. H2Hadoop architecture leverages on NameNode's ability to assign jobs to the TaskTrakers (DataNodes) within the cluster. By adding control features to the NameNode, H2Hadoop can intelligently direct and assign tasks to the DataNodes that contain the required data without sending the job to the whole cluster. Comparing with native Hadoop, H2Hadoop reduces CPU time, number of read operations, and another Hadoop factors.</description><identifier>ISSN: 2168-7161</identifier><identifier>EISSN: 2168-7161</identifier><identifier>EISSN: 2372-0018</identifier><identifier>DOI: 10.1109/TCC.2016.2535261</identifier><identifier>CODEN: ITCCF6</identifier><language>eng</language><publisher>Piscataway: IEEE Computer Society</publisher><subject>BigData ; Cloud computing ; Clusters ; Computational efficiency ; Computer architecture ; Cost analysis ; Data mining ; Deoxyribonucleic acid ; DNA ; File systems ; H2Hadoop ; Hadoop ; Hadoop Performance ; MapReduce ; Metadata ; Resource allocation ; Resource management ; Resource scheduling ; Task scheduling ; Text Data ; Text mining</subject><ispartof>IEEE transactions on cloud computing, 2018-10, Vol.6 (4), p.1031-1040</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2018</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c291t-94da5c2ded94618f2fbfc93a604ce9bb05cbc63301fa7f0f780d4890eff740803</citedby><cites>FETCH-LOGICAL-c291t-94da5c2ded94618f2fbfc93a604ce9bb05cbc63301fa7f0f780d4890eff740803</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/7420665$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,27922,27923,54794</link.rule.ids></links><search><creatorcontrib>Alshammari, Hamoud</creatorcontrib><creatorcontrib>Lee, Jeongkyu</creatorcontrib><creatorcontrib>Bajwa, Hassan</creatorcontrib><title>H2Hadoop: Improving Hadoop Performance Using the Metadata of Related Jobs</title><title>IEEE transactions on cloud computing</title><addtitle>TCC</addtitle><description>Cloud Computing leverages Hadoop framework for processing BigData in parallel. Hadoop has certain limitations that could be exploited to execute the job efficiently. These limitations are mostly because of data locality in the cluster, jobs and tasks scheduling, and resource allocations in Hadoop. Efficient resource allocation remains a challenge in Cloud Computing MapReduce platforms. We propose H2Hadoop, which is an enhanced Hadoop architecture that reduces the computation cost associated with BigData analysis. The proposed architecture also addresses the issue of resource allocation in native Hadoop. H2Hadoop provides a better solution for "text data", such as finding DNA sequence and the motif of a DNA sequence. Also, H2Hadoop provides an efficient Data Mining approach for Cloud Computing environments. H2Hadoop architecture leverages on NameNode's ability to assign jobs to the TaskTrakers (DataNodes) within the cluster. By adding control features to the NameNode, H2Hadoop can intelligently direct and assign tasks to the DataNodes that contain the required data without sending the job to the whole cluster. Comparing with native Hadoop, H2Hadoop reduces CPU time, number of read operations, and another Hadoop factors.</description><subject>BigData</subject><subject>Cloud computing</subject><subject>Clusters</subject><subject>Computational efficiency</subject><subject>Computer architecture</subject><subject>Cost analysis</subject><subject>Data mining</subject><subject>Deoxyribonucleic acid</subject><subject>DNA</subject><subject>File systems</subject><subject>H2Hadoop</subject><subject>Hadoop</subject><subject>Hadoop Performance</subject><subject>MapReduce</subject><subject>Metadata</subject><subject>Resource allocation</subject><subject>Resource management</subject><subject>Resource scheduling</subject><subject>Task scheduling</subject><subject>Text Data</subject><subject>Text mining</subject><issn>2168-7161</issn><issn>2168-7161</issn><issn>2372-0018</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><recordid>eNpNkNFLwzAQxoMoOObeBV8CPnfepUna-CZF3WSiyPYc0jTRjW2ZSRX8723pEO_ljrvvuzt-hFwiTBFB3SyrasoA5ZSJXDCJJ2TEUJZZgRJP_9XnZJLSBrooBSpUIzKfsZlpQjjc0vnuEMP3ev9Ohw59ddGHuDN76-gq9YP2w9Fn15rGtIYGT9_c1rSuoU-hThfkzJttcpNjHpPVw_2ymmWLl8d5dbfILFPYZoo3RljWuEZxiaVnvvZW5UYCt07VNQhbW5nngN4UHnxRQsNLBc77gkMJ-ZhcD3u7bz-_XGr1JnzFfXdSMxTIJQfRq2BQ2RhSis7rQ1zvTPzRCLpnpjtmumemj8w6y9VgWTvn_uQFZyClyH8BCDdmNw</recordid><startdate>20181001</startdate><enddate>20181001</enddate><creator>Alshammari, Hamoud</creator><creator>Lee, Jeongkyu</creator><creator>Bajwa, Hassan</creator><general>IEEE Computer Society</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20181001</creationdate><title>H2Hadoop: Improving Hadoop Performance Using the Metadata of Related Jobs</title><author>Alshammari, Hamoud ; Lee, Jeongkyu ; Bajwa, Hassan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c291t-94da5c2ded94618f2fbfc93a604ce9bb05cbc63301fa7f0f780d4890eff740803</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2018</creationdate><topic>BigData</topic><topic>Cloud computing</topic><topic>Clusters</topic><topic>Computational efficiency</topic><topic>Computer architecture</topic><topic>Cost analysis</topic><topic>Data mining</topic><topic>Deoxyribonucleic acid</topic><topic>DNA</topic><topic>File systems</topic><topic>H2Hadoop</topic><topic>Hadoop</topic><topic>Hadoop Performance</topic><topic>MapReduce</topic><topic>Metadata</topic><topic>Resource allocation</topic><topic>Resource management</topic><topic>Resource scheduling</topic><topic>Task scheduling</topic><topic>Text Data</topic><topic>Text mining</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Alshammari, Hamoud</creatorcontrib><creatorcontrib>Lee, Jeongkyu</creatorcontrib><creatorcontrib>Bajwa, Hassan</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Xplore</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on cloud computing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Alshammari, Hamoud</au><au>Lee, Jeongkyu</au><au>Bajwa, Hassan</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>H2Hadoop: Improving Hadoop Performance Using the Metadata of Related Jobs</atitle><jtitle>IEEE transactions on cloud computing</jtitle><stitle>TCC</stitle><date>2018-10-01</date><risdate>2018</risdate><volume>6</volume><issue>4</issue><spage>1031</spage><epage>1040</epage><pages>1031-1040</pages><issn>2168-7161</issn><eissn>2168-7161</eissn><eissn>2372-0018</eissn><coden>ITCCF6</coden><abstract>Cloud Computing leverages Hadoop framework for processing BigData in parallel. Hadoop has certain limitations that could be exploited to execute the job efficiently. These limitations are mostly because of data locality in the cluster, jobs and tasks scheduling, and resource allocations in Hadoop. Efficient resource allocation remains a challenge in Cloud Computing MapReduce platforms. We propose H2Hadoop, which is an enhanced Hadoop architecture that reduces the computation cost associated with BigData analysis. The proposed architecture also addresses the issue of resource allocation in native Hadoop. H2Hadoop provides a better solution for "text data", such as finding DNA sequence and the motif of a DNA sequence. Also, H2Hadoop provides an efficient Data Mining approach for Cloud Computing environments. H2Hadoop architecture leverages on NameNode's ability to assign jobs to the TaskTrakers (DataNodes) within the cluster. By adding control features to the NameNode, H2Hadoop can intelligently direct and assign tasks to the DataNodes that contain the required data without sending the job to the whole cluster. Comparing with native Hadoop, H2Hadoop reduces CPU time, number of read operations, and another Hadoop factors.</abstract><cop>Piscataway</cop><pub>IEEE Computer Society</pub><doi>10.1109/TCC.2016.2535261</doi><tpages>10</tpages></addata></record>
fulltext	fulltext
identifier	ISSN: 2168-7161
ispartof	IEEE transactions on cloud computing, 2018-10, Vol.6 (4), p.1031-1040
issn	2168-7161 2168-7161 2372-0018
language	eng
recordid	cdi_proquest_journals_2151464050
source	IEEE Electronic Library (IEL) Journals
subjects	BigData Cloud computing Clusters Computational efficiency Computer architecture Cost analysis Data mining Deoxyribonucleic acid DNA File systems H2Hadoop Hadoop Hadoop Performance MapReduce Metadata Resource allocation Resource management Resource scheduling Task scheduling Text Data Text mining
title	H2Hadoop: Improving Hadoop Performance Using the Metadata of Related Jobs
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-13T13%3A42%3A23IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_ieee_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=H2Hadoop:%20Improving%20Hadoop%20Performance%20Using%20the%20Metadata%20of%20Related%20Jobs&rft.jtitle=IEEE%20transactions%20on%20cloud%20computing&rft.au=Alshammari,%20Hamoud&rft.date=2018-10-01&rft.volume=6&rft.issue=4&rft.spage=1031&rft.epage=1040&rft.pages=1031-1040&rft.issn=2168-7161&rft.eissn=2168-7161&rft.coden=ITCCF6&rft_id=info:doi/10.1109/TCC.2016.2535261&rft_dat=%3Cproquest_ieee_%3E2151464050%3C/proquest_ieee_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c291t-94da5c2ded94618f2fbfc93a604ce9bb05cbc63301fa7f0f780d4890eff740803%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2151464050&rft_id=info:pmid/&rft_ieee_id=7420665&rfr_iscdi=true