Loading…

BIG DATA PROCESSING WITH APACHE SPARK

With the exponential growth of information, it is no surprise that we are in a period of history as the Information Age. The rapid growth of data has presented challenges regarding storage and processing technology. This article refers to Apache Spark, an ecosystem that provides many integrated tech...

Full description

Saved in:
Bibliographic Details
Published in:Tra Vinh University Journal of Science 2023-07
Main Authors: Tran, Quy Quang, Nguyen, Binh Duc, Nguyen, Linh Thi Thuy, Nguyen, Oanh Thi Thu
Format: Article
Language:English
Citations: Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:With the exponential growth of information, it is no surprise that we are in a period of history as the Information Age. The rapid growth of data has presented challenges regarding storage and processing technology. This article refers to Apache Spark, an ecosystem that provides many integrated technologies in Big Data processing, including machine learning libraries and data storage platforms. Apache Spark provides distributed data processing for open source applications, loading data in-memory and making operations for analyzing data of any size, with efficient support for popular programming languages like Java, Scala, R, and Python. The article aims to compare the superior computing power of Saprk compared to Hadoop and how to connect Spark with today's popular data processing tools such as the R language.
ISSN:2815-6072
2815-6099
DOI:10.35382/tvujs.13.6.2023.2099