Skip to content

Machine Learning model API deployed with GCP & DVC. Predicts a breast-cancer diagnostic.

License

Notifications You must be signed in to change notification settings

santiagoahl/breast-cancer-prediction

Repository files navigation

Breast Cancer Prediction

Machine Learning classifier API build with FastAPI and Google cloud. This model predict whether a given patient has or not a malignant mass diagnosis. Prediction is based on patient`s clinical data.

fastapi google-cloud docker dvc angular scikit-learn kaggle pandas

Key FeaturesHow To UseCreditsLicense

screenshot

Key Features

This machine learning model predicts the diagnosis of a patient. Prediction choses between Malignant and Benign diagnosted masses. The dataset is taken from the Breast Cancer Wisconsin (Diagnostic) Data Set. So here are the key features of this project:

  • The model is supported under a backend API built with FastAPI through the POST method, it asks the patients data as JSON format and returns its predicted diagnostic in the same format.

  • The dataset and the current model is tracked using a GCP (Google Cloud) bucket.

  • MLOps is done thanks to DVC data version control. Which helps us to connect the data and model with GCP, as well to update the model through a training pipeline in order to make an optimal CI/CD.

  • The Dockerfile saves all required information to run the model in another machines through a container. Just running the initializer.sh is enough to turn the whole system on.

  • The src dir contains all the scripts required to update the model parameters. This is done using a data preparation and a training pipeline (As previously said).

  • A testing pipeline is also implemented in such a way every time that the model is updated, must pass a test to make sure that It is running without bugs.

  • Attribute Information:

    • ID number
    • Diagnosis (M = malignant, B = benign)
  • Ten real-valued features are computed for each cell nucleus:

    • radius (mean of distances from center to points on the perimeter)
    • texture (standard deviation of gray-scale values)
    • perimeter
    • area
    • smoothness (local variation in radius lengths)
    • compactness (perimeter^2 / area - 1.0)
    • concavity (severity of concave portions of the contour)
    • concave points (number of concave portions of the contour)
    • symmetry
    • fractal dimension ("coastline approximation" - 1)
  • Dataset balancing with imblearn.under_sampling.RandomUnderSampler.

  • Based on Scikit-Learn modules and functions such like:

    • linear_model.LogisticRegression : Classification model.
    • model_selection.GridSearchCV : Hyperparameter optimization.
  • The model got a 96.3% of f1 score and a 96.5% of accuracy.

  • The confusion matrix is the following:

Confusion

  • Our model is very sensible: There are a few of false negatives, which is a great result.

Front-End Stack

Currently, the project is on Front-End phase. It is planned to be developed using the framework Angular CLI, which helps us to consume the REST API. The source code can be viewed in the directory /static. Here's how it looks

screenshot

How To Use

To clone and run this application, follow these steps

# Clone this repository
$ git clone https://github.com/santiagoahl/breast-cancer-prediction.git

# Go into the repository
$ cd breast-cancer-prediction

# Install requirements

$ pip install -r requirements.txt
$ pip install -r requirements_test.txt
$ pip install -r api/requirements.txt

# Install Backend dependencies

$ pip install uvicorn
$ pip install fastapi

# Run the server

$ uvicorn api.main:app

# Server is set to be constant, so run in your browser:

http://127.0.0.1:8000 

# Click on `POST` method

# Click on `Try it out`

# Replace the `Request Body` with a patient data, it must have a json format, here is an example:

{
  "radius_mean": 20.57,
  "texture_mean": 17.77,
  "perimeter_mean": 132.9,
  "area_mean": 1326,
  "smoothness_mean": 0.08474,
  "compactness_mean": 0.07864,
  "concavity_mean": 0.0869,
  "symmetry_mean": 0.1812,
  "fractal_dimension_mean": 0.05667,
  "radius_se": 0.5435,
  "texture_se": 0.7339,
  "perimeter_se": 3.398,
  "area_se": 74.08,
  "smoothness_se": 0.005225,
  "compactness_se": 0.01308,
  "concavity_se": 0.0186,
  "concave_points_se": 0.0134,
  "symmetry_se": 0.01389,
  "fractal_dimension_se": 0.003532,
  "texture_worst": 0.1238,
  "smoothness_worst": 0.1238,
  "compactness_worst": 0.1866,
  "concavity_worst": 0.2416,
  "concave_points_worst": 0.186,
  "symmetry_worst": 0.08902,
  "fractal_dimension_worst": 0.08902
}

# Click on execute and view (Or download) the results

Credits

This software uses the following data and packages:

License

MIT


Web Site santiagoal.super.site  ·  GitHub @santiagoahl  ·  Twitter @sahumadaloz

Releases

No releases published

Packages

No packages published

Languages