Skip to content

A Flask Server that's deployed to Vertex AI as a container and provides inference for the Sunbird AI API.

Notifications You must be signed in to change notification settings

magalaReuben/api-inference-server

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Sunbird API Inference Service

This repository contains code for a flask server that's containerized and deployed to Vertex AI on GCP. and Azure Machine Learning Studio (https://studio.azureml.net/)

The flask server provides access to the following Sunbird AI models:

The process of deployment is as follows:

  • The models are pulled from HuggingFace. See asr_inference and translate_inference.
  • The flask app exposes 2 endpoints: isalive and predict as required by Vertex AI. The predict endpoint receives a list of inference requests, passes them to the model and returns the results.
  • A docker container is built from this flask app and is pushed to the Google container repository (GCR).
  • On Vertex AI, a "model" is created from this container and then deployed to a Vertex endpoint, this is the same for azure

NOTE: Check out this article for a detailed tutorial on this process for GCP.

Check out [https://learn.microsoft.com/en-us/azure/machine-learning/how-to-deploy-online-endpoints?view=azureml-api-2&tabs=azure-cli] on how to deploy models on online endpoints to azure

The resulting endpoint is then used in the main Sunbird AI API.

TODOs

  • Add TTS (This is available for azure)
  • Handle long audio files (This is available for azure).
  • Use a smaller base container, current container (huggingface/transformers-pytorch-gpu) is pretty heavy and maybe unncessary. This would enable us to end up with a smaller artificat which takes up less memory.
  • Automate the deployment process for both the API and this inference service (using Github Actions or Terraform...or both?)
  • Come up with an end-to-end workflow from data ingestion to deployment (what tools are required for this?).

About

A Flask Server that's deployed to Vertex AI as a container and provides inference for the Sunbird AI API.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 90.1%
  • Python 9.6%
  • Dockerfile 0.3%