Loading…
Categorical Attribute traNsformation Environment (CANE): A python module for categorical to numeric data preprocessing
Categorical Attribute traNsformation Environment (CANE) is a simpler but powerful data categorical preprocessing Python package. The package is valuable since there is currently a large range of Machine Learning (ML) algorithms that can only be trained using numerical data (e.g., Deep Learning, Supp...
Saved in:
Published in: | Software impacts 2022-08, Vol.13, p.100359, Article 100359 |
---|---|
Main Authors: | , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Categorical Attribute traNsformation Environment (CANE) is a simpler but powerful data categorical preprocessing Python package. The package is valuable since there is currently a large range of Machine Learning (ML) algorithms that can only be trained using numerical data (e.g., Deep Learning, Support Vector Machines) and several real-world ML applications are associated with categorical data attributes. Currently, CANE offers three categorical to numeric transformation methods, namely: Percentage Categorical Pruned (PCP), Inverse Document Frequency (IDF) and a simpler One-Hot-Encoding method. Additionally, the CANE module is well documented with several code examples that can help in its adoption by non expert users.
•Good impact in big data environments.•Simpler but powerful data categorical preprocessing python package.•Several categorical transformations with various options and multicore settings.•Uses two popular data Python formats, the Pandas Dataframe and Spark Dataframe. |
---|---|
ISSN: | 2665-9638 2665-9638 |
DOI: | 10.1016/j.simpa.2022.100359 |