Loading…

Big Data ETL Process and Its Impact on Text Mining Analysis for Employees’ Reviews

Big data analysis is challenging in the current context for enterprises that would like to apply these capabilities in the human resource sector. This paper will show how an organization can take advantage of the current or former employees’ reviews that are provided on a constant basis on different...

Full description

Saved in:
Bibliographic Details
Published in:Applied sciences 2022-08, Vol.12 (15), p.7509
Main Authors: Tanasescu, Laura Gabriela, Vines, Andreea, Bologa, Ana Ramona, Vaida, Claudia Antal
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Big data analysis is challenging in the current context for enterprises that would like to apply these capabilities in the human resource sector. This paper will show how an organization can take advantage of the current or former employees’ reviews that are provided on a constant basis on different sites, so that the management can adjust or change business decisions based on employees’ wishes, dissatisfaction or needs. Considering the previously mentioned challenge on big data analysis, this research will first provide the best practice for the collection and transformation of the data proposed for analysis. The second part of this paper presents the extraction of two datasets containing employee reviews using data scraping techniques, the analysis of data by using text mining techniques to retrieve business insights and the comparison of the results for these algorithms. Experimental results with Naïve Bayes, Logistic Regression, K-Nearest Neighbor and Support Vector Machine for employee sentiment prediction showed much better performances for Logistic Regression. Three out of the four analyzed algorithms performed better for the second, triple-size dataset. The final aim of the paper is to provide an end-to-end solution with high performance and reduced costs.
ISSN:2076-3417
2076-3417
DOI:10.3390/app12157509