Skip to content

KNN Classification on Abalone dataset and Ablation study on normalization of the data

Notifications You must be signed in to change notification settings

shreeyajoshi2013/KNN-CLassification-using-Abalone-Dataset

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

KNN Classification with Abalone dataset

Classification task – Predict ‘Rings’ from other features. Rings indicate the age of the fish. This dataset contains information about physical measurements of abalone for predicting 'Rings' from other features. This project classifies the data with KNN classifier.

The dataset taken from link.

Built with

  • Google Colab

Libraries used

  • Pandas
  • Numpy
  • Matplotlib
  • Seaborn
  • Scikit-learn

Highlights

  • Improving on KNN using method of weighted KNN
  • Ablation study on Normalization

What is being done?

  1. Loading the dataset and splitting into Train-Test set

  2. Normalization and training

  3. Running the model using its default configuration on the test data
    Looping over values of K. Finding best accuracy and K value
    Plotting the graph of K values against Accuracies

  4. Improving KNN using weighted KNN method using 3 schemas - Default, Manhattan and Euclidean
    Plotting the graph of K vs Accuracy for all the 3 configurations with normalization

  5. Ablation Study on Normalization
    Performing Ablation study by removing the normalization step from the pipeline of preprocessing
    Plot the above accuracies against the K values for all the three configurations (Without normalization)

Conclusion

  • Effect of normalization/ Standardization - The normalizarion does not make the classifier reach high accuracy for any of the tested values of k. This is applicable to both uniform KNN and weighted KNN.
  • Without the normalization, as k increases and the neighbourhood size inresases, the performance lowers. This is not observed in the case of normalized data.
  • For the differenct weighting schemes, the performance is not very different.
  • The accuracy is overall low (below 30%). Hence it can be said that more complex models might be needed to classify well in this domain.

About

KNN Classification on Abalone dataset and Ablation study on normalization of the data

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published