EzML: Customizable Machine Learning Pipeline for Tabular Data Classification models

Pipeline for ML on tabular data for classification purposes (regression soon to be added).

How it works

An orchestrator present in the 'main.py' reads the 'config.json' file which is a configuration file that contains all the information necessary to preprocess, perform Cross-Validation steps, feature selection, as well as models to do training and optimization parameters while saving all the steps in a 'loguru.log' file. These scripts and JSON file can be found on the 'src' folder.

The scripts on the 'src' folder are:

'main.py': The main file that runs the orchestrator. This orchestrator in turn calls the remaining steps of the pipeline according to the settings on the 'config.json' configuration file.
'preprocessing.py': This is the first script that is run by the orchestrator in order to preprocess the data (scaling, solving missing values, categorical encoding, etc). This script contains the 'PreProcessor' class with the methods necessary to do the preprocessing steps.
'train_model.py': A python script containing the 'ModelTraining' class that reads the 'config.json' configuration file, trains the models, optimizes, saves the best model and other plots for model evaluation as well as an excel file with the metrics employed in the cross validation set.
'utils.py': A utilities script that contains helper functions independent of the main classes from 'preprocessing.py' and 'train_model.py'.

Things to consider on the 'config.json' configuration file:

The setting 'path_backbone' is the path for the repository. If you clone the repo to your local system, make sure to paste the path to the local repo clone in this setting. The 'data' setting is where the data is located inside the 'path_backbone' directory. This can be kept as is as long as the structure of the repository isn't changed (particularly naming on the 'data' folder).

Next features:

New features will be included, such as:

Configuration for regression problems;
Settings for both binary, multi-class and multi-label classification problems;
Improved logging and error catching;
Better 'readme' file containing detailed information on how to operate the 'config.json' file.

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
.vscode		.vscode
data		data
reports		reports
src		src
.env		.env
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py
test_environment.py		test_environment.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EzML: Customizable Machine Learning Pipeline for Tabular Data Classification models

How it works

Things to consider on the 'config.json' configuration file:

Next features:

About

Releases 2

Packages

Languages

License

Rafael-Silva-Oliveira/EzML

Folders and files

Latest commit

History

Repository files navigation

EzML: Customizable Machine Learning Pipeline for Tabular Data Classification models

How it works

Things to consider on the 'config.json' configuration file:

Next features:

About

Resources

License

Stars

Watchers

Forks

Releases 2

Packages 0

Languages

Packages