The code implements three popular machine learning classifiers—k-Nearest Neighbors (k-NN), Gaussian Naive Bayes, and Support Vector Machine (SVM)—to classify images based on extracted features. It preprocesses the data, normalizes it, and applies Principal Component Analysis (PCA) for dimensionality reduction. The code includes functions for calculating accuracy, precision, recall, F1 score, and confusion matrix. It evaluates each classifier's performance using both Euclidean distance and cosine similarity for k-NN. Additionally, the code visualizes results through line plots, 3D scatter plots, and confusion matrices, providing a comprehensive overview of each classifier’s effectiveness on the given dataset.
- Jupyter notebook
- Python
- NumPy
- Pandas
- Matplotlib
- Seaborn
- Import the dataset containing image features and labels. Loads the images and labels from the source, preparing for preprocessing.
- Clean and prepare the data for feature extraction.
- Converts images to a suitable format, handles missing values, and prepares data for feature extraction.
- Extract numerical features from images.
- Derives feature vectors from images for use in the classifiers.
- Scale the feature vectors.
- Normalizes features to ensure they are on a similar scale for effective model performance.
- Reduce the number of features while retaining variance.
- Applies PCA to transform and reduce dimensionality, simplifying the feature space.
- Divide data into training and testing sets.
- Splits the dataset into training and testing subsets to evaluate model performance.
- Choose the classifiers (k-NN, Gaussian Naive Bayes, SVM).
- Select k-NN, Gaussian Naive Bayes, and SVM for comparison.
- Train the models on the training dataset.
- Fits the classifiers on the training data, learning from features.
- Evaluate the models using accuracy, precision, recall, F1 score, and confusion matrix.
- Assesses models’ performance on the test data with various metrics and similarity measures (Euclidean, cosine).
- Visualize the performance and metrics of each model.
- Generates line plots, 3D scatter plots, and confusion matrices to interpret and compare classifier effectiveness.
-
Supports both Euclidean and Cosine distance metrics.
-
Evaluate different values of k for classification accuracy.
-
Includes methods for accuracy, precision, recall, and F1 score evaluation.
- Assumes Gaussian distribution for features.
- Computes the likelihood of each class given the features.
- Implements SVM with gradient descent optimization.
- Suitable for high-dimensional data classification.
- Normalizes the dataset.
- Splits data into training and testing sets.
- Performs Principal Component Analysis (PCA) for dimensionality reduction.
-
Plots sample images with predicted and actual labels.
-
Visualizes the data in 3D space using PCA.
KNNClassifier Class
- The KNNClassifier class implements the k-Nearest Neighbors algorithm.
- init(self, k, distance_metric='euclidean'): Initializes the classifier with k neighbors and the specified distance metric (Euclidean or Cosine).
- fit(self, X_train, y_train): Fits the model to the training data.
- predict(self, X_test): Predicts the labels for the test data.
- calculate_distances(self, test_instance): Calculates distances between a test instance and all training instances.
- evaluate_accuracy(self, y_true, y_pred): Evaluates the accuracy of the predictions.
- evaluate_precision_recall_f1(self, y_true, y_pred): Evaluates precision, recall, and F1 score.
- preprocess_data(data): Normalizes the data.
- split_train_test(data, labels, train_size): Splits the data into training and testing sets.
GaussianNB Class
- The GaussianNB class implements the Gaussian Naive Bayes algorithm.
- fit(self, X_train, y_train): Trains the Gaussian Naive Bayes model.
- predict(self, X_test): Predict labels for the test data based on the Gaussian distribution.
SVM Class
- The SVM class implements a Support Vector Machine using gradient descent optimization.
- init(self, learning_rate=0.001, lambda_param=0.01, n_iters=1000): Initializes the SVM with specified parameters.
- fit(self, X, y): Trains the SVM on the training data using gradient descent.
- predict(self, X): Predicts labels for the test data.