Loading…

Categorical Attribute traNsformation Environment (CANE): A python module for categorical to numeric data preprocessing

Categorical Attribute traNsformation Environment (CANE) is a simpler but powerful data categorical preprocessing Python package. The package is valuable since there is currently a large range of Machine Learning (ML) algorithms that can only be trained using numerical data (e.g., Deep Learning, Supp...

Full description

Saved in:
Bibliographic Details
Published in:Software impacts 2022-08, Vol.13, p.100359, Article 100359
Main Authors: Matos, Luís Miguel, Azevedo, João, Matta, Arthur, Pilastri, André, Cortez, Paulo, Mendes, Rui
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Categorical Attribute traNsformation Environment (CANE) is a simpler but powerful data categorical preprocessing Python package. The package is valuable since there is currently a large range of Machine Learning (ML) algorithms that can only be trained using numerical data (e.g., Deep Learning, Support Vector Machines) and several real-world ML applications are associated with categorical data attributes. Currently, CANE offers three categorical to numeric transformation methods, namely: Percentage Categorical Pruned (PCP), Inverse Document Frequency (IDF) and a simpler One-Hot-Encoding method. Additionally, the CANE module is well documented with several code examples that can help in its adoption by non expert users. •Good impact in big data environments.•Simpler but powerful data categorical preprocessing python package.•Several categorical transformations with various options and multicore settings.•Uses two popular data Python formats, the Pandas Dataframe and Spark Dataframe.
ISSN:2665-9638
2665-9638
DOI:10.1016/j.simpa.2022.100359