This is my own work following the "Deploy Machine Learning Models with Django" tutorial by Piotr Płoński at deploymachinelearning.com.
The original tutorial briefly covers a large number of topics and has a few issues, so this version improves upon the original in a number of ways.
- Newer versions of various software packages are used over the ones that were available in 2019 when the original tutorial was written. Namely, Django 4.0.3 is used instead of the specified 2.2.4 from the tutorial. See
requirements.txtfor other software versions. - This project does not create a
backenddirectory for theserverDjango project as it appears in the tutorial that thebackenddirectory contains nothing other than the Django project itself. - The Django apps
endpointsandmlexist at the top of theserverDjango directory instead of creating a newappdirectory just to hold Django apps.
- The data training notebook
Data Training.ipynbuses anOrdinalEncoderto encode categorical data instead of theLabelEncoderused in the tutorial. The sklearn docs forLabelEncoderexplains why:This transformer should be used to encode target values, i.e.
y, and not the inputX. - The data training notebook
Data Training.ipynbtrains theOrdinalEncoderon the full set of inputsXinstead of only usingX_trainas in the tutorial. This solves the issue that occurs whenX_testcontains unique values that are not also found inX_train. So the encoder must be trained on the full set of possible values for all input features. - The data training notebook
Data Training.ipynbtakes one additional step after training the algorithms to evaluate their accuracy withsklearn.metrics.confusion_matrix.
- A number of
CharFieldattributes inendpoints/models.pywere changed toTextFieldwhich is more appropriate for strings of significant length, andmax_lengthparameters were removed fromMLAlgoithm.descriptionandMLAlgorithm.code. __str__methods were written in various model classes to improve readability on the Django site admin.- A
Metaclass was added to a number of models to improve readability in the generated pages. - Docstrings were added in various places to improve understanding.
- Replaced hardcoding of relative paths with
pathlib.Pathin places such asml.income_classifier.random_forest. - Replaced hardcoding of categorical features in
ml.income_classifierwithOrdinalEncoder.feature_names_in_to allow for more dynamic processing. - Instead of using a
RandomForestClassifierfromml.income_classifierand creating a new class for each new type of classifier algorithm, a generalIncomeClassifierclass was created to hold algorithm data for different income classier models. - The
MLRegistry.endpointsproperty was changed toMLRegistry.__endpoint_algorithmsto indicate it should not be directly manipulated outside the class. - A
get_algorithmmethod was added to theMLRegistrywhich finds and returns algorithm objects from the registry. This method is then used in thePredictViewto make requested predictions. - A
__str__method was added to theMLRegistryclass which returns a string representation of the endpoint algorithms dictionary. This method is used to test the absence (len == 2) or presence (len > 2) of algorithms in the registry. - The
MLRegistrywas reworked to strictly associate DB objects withIncomeClassifierinstances. The__init__method instantiates anIncomeClassifierfor eachMLAlgorithmobject in the database and adds it to the registry list. IncomeClassifiernow keeps track of which artifacts belong to which algorithm and no longer requires the file name upon instantiation.
- The
PredictViewnow consistently expects JSON input fromresponse.data. It usesjson.loadson the data before sending it to the classifier for prediction. - The
IncomeClassifiernow correctly fills in missing values in the input data with training mode values by usingDataFrame.fillna()with theinplace=Trueparameter. - The
MLRequestmodel has a new field calledpredictionto use as a disambiguous record of the prediction label that can be compared to thefeedbackfield to calculate accuracy. MLRequestobjects usejson.loadsinstead ofjson.dumpsto record input data since it is already expected to be a JSON string.- The
EndpointTests.test_predict_viewtest case now dumps the test data dict to a JSON string withjson.dumps()instead of posting the dict directly to the predict view.