Skip to content

Project for Artificial Intelligence exam: Pruning of the Decision Tree rules

Notifications You must be signed in to change notification settings

LorenzoGianassi/Decision_Tree_Rules_Pruning

Repository files navigation

Decision Tree Rules Pruning

Table of Contents

About the Project

The purpose of the project is to create a decision tree using the ID3 algorithm with entropy impurity measurement. Once the decision tree has been created, the Pruning strategy will be implemented, so it will be necessary to transform the tree into rules that represent all the paths from roots to leaves. The Pruning strategy will be performed by evaluating the error on the ValidationSet. Finally, we will go to perform the comparisons on the accuracy before and after the operation by Pruning.

Dataset

In this project, we will use a standard imbalanced machine learning dataset referred to as the “Adult Income” or simply the “adult” dataset.

The dataset is credited to Ronny Kohavi and Barry Becker and was drawn from the 1994 United States Census Bureau data and involves using personal details such as education level to predict whether an individual will earn more or less than $50,000 per year.

The dataset contains 16 columns.

Target filed: Income

  • The income is divide into two classes: <=50K and >50K

Number of attributes: 14

  • These are the demographics and other features to describe a person

We can explore the possibility in predicting income level based on the individual’s personal information.

Usage

  • Download and save a .csv file in the Project folder, in particular the DatasSet.py file will deal specifically with the dataset used so changes to the code in that class will be necessary in case you want to use another dataset. In the .csv file, the first line must contain the names of the dataset attributes. For the dataset parse I used Pandas, so you will need to download it too.
  • In the Main.py file we will run more tests in order to analyze the problem more in depth. In particular, how the accuracy varies depending on the depth of the tree and the number of examples that are used to create and test the tree.

Execution times may vary depending on the size of the dataset and the depth of the tree (the pruning operation is the one that takes the most time).

For the realization of this project I have consulted the following sources:

Authors

  • Lorenzo Gianassi

Acknowledgments

Artificial Intelligence Project © Course held by Professor Paolo Frasconi - Computer Engineering Degree @University of Florence

About

Project for Artificial Intelligence exam: Pruning of the Decision Tree rules

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages