Loading…

Data storage and modeling system for GPS, Gyro and camera data using apache flume and hadoop map reduce

To collect and store large amounts of data from IoT devices requires an agent-based system such as Apache flume. Through Apache flume's agent-based data collection system, incoming data can be stored and grouped by data type and stored on a Hadoop distributed file system (HDFS) and MapReduce-ba...

Full description

Saved in:
Bibliographic Details
Main Authors: Wedashwara, Wirarama, Wijayanto, Heri, Zubaidi, Ariyan, Arimbawa, I. Wayan Agus
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:To collect and store large amounts of data from IoT devices requires an agent-based system such as Apache flume. Through Apache flume's agent-based data collection system, incoming data can be stored and grouped by data type and stored on a Hadoop distributed file system (HDFS) and MapReduce-based data processing. The study presented a data storage and modeling system using http-source from Apache flume for numerical data from GPS and Gyro sensors, as well as image data from the ESP32 camera. This research was conducted as a continuation of data processing for previous smart electrical vehicle (SEV) navigation systems that have been carried out to data modeling. The study used two flume agents with different storage configurations for numeric and image data. Hadoop MapReduce models streaming data-based data that mappers and reducers process using the python programming language. Numerical data is modeled in the form of a tree based on frequent items between features. The image data is modeled with haar cascade vehicle detection. Data collection testing was carried out with variations in data transmission configurations both flume and http POST json on microcontrollers. The test results showed that the system could receive data of 10000 and 25 images per minute with a packet delivery ratio (PDR) of 99.827% and a delay of 0.192s for numerical data and 98.481% and a delay of 0.327s for image data. In this study, Hadoop MapReduce was only carried out through a single node cluster. Hadoop MapReduce testing is performed by configuring the amount of data per file in HDFS. Processing the intact file resulted in the lowest processing time of 4,103 minutes for 100k numerical data and 2,102 minutes for 50 images.
ISSN:0094-243X
1551-7616
DOI:10.1063/5.0200515