Loading…

On experimenting large dataset for visualization using distributed learning and tree plotting techniques

Visualization as one major field making up data science has played significant roles in data exploration. With visualization at the center of every data analysis and application, exploratory analysis has proved the basis for which data analyst comparatively implement what-if scenario before and afte...

Full description

Saved in:
Bibliographic Details
Published in:Scientific African 2020-07, Vol.8, p.e00466, Article e00466
Main Authors: Johnson, Olanrewaju V., Jinadu, Olayinka T., Aladesote, Olomi I.
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Visualization as one major field making up data science has played significant roles in data exploration. With visualization at the center of every data analysis and application, exploratory analysis has proved the basis for which data analyst comparatively implement what-if scenario before and after processing. Interesting patterns generated from models visualized is very helpful in fast decision-making, model tuning and optimization. Although, conventional methods such as histogram, pie chart, box plot and bar graph are in most occasions not adequate enough to effectively convey the interesting pattern to be mined in large dataset. This paper therefore, presents a tree-plot approach which make use of an In-memory node mechanism from the h2o package to place the large dataset in memory. A Gradient Boosted Model (GBM) from same was implemented as the underlying learning algorithm to build the tree model, while the modeled trees were plotted using plotting techniques in data.tree. Execution process time, AUC, MSE and RMSE results obtained provide basis for evaluating how well the data was trained and for visualizing the modeled tree. It further substantial how a learner algorithm could work with a plotting method with less computational cost.
ISSN:2468-2276
2468-2276
DOI:10.1016/j.sciaf.2020.e00466