This is a script that trains and deploys an image classification model in Amazon SageMaker.
I used the public dataset Butterfly Image Classification on Kaggle for training.
The model is SageMaker’s Image Classification framework (based on MXNext), with the following hyperparameters:
estimator.set_hyperparameters(
num_layers=18,
use_pretrained_model=1,
num_classes=num_classes,
mini_batch_size=32,
epochs=10,
learning_rate=0.001,
precision_dtype='float32',
num_training_samples=num_training_samples
)
The hyperparameters are chosen arbitarily as common practice and also relatively small numbers, since this is a demonstration of the deployment more than accuracy. The number of classes and samples in the dataset are dynamically calculated.
- Download the dataset from Kaggle and unzip it in the repo's folder
unzip /path/to/archive.zip -d dataset
- Create a virtual environment
source venv.sh
- Install dependencies
pip install -r requirements.txt
- Process the dataset's folder structure
python restructure_dataset.py
- Create .lst files
python generate_lst.py
- Run training script *Rewrite the bucket names and other constants
python train_deploy.py
- Invoke inference (e.g. with AWS CLI)
aws sagemaker-runtime invoke-endpoint \
--endpoint-name "classify-butterfly" \
--body fileb://dataset/test/Image_999.jpg \
--content-type "image/jpeg" \
output_file.txt