Use cases:
- Input a protein, and get a probability distribution of amino acids at each position (masked prediction). 1a. Following from this, predict most likely variant
- Input a protein, and get the latent representation of it (representation).
- Visualize protein, and labeled variant.
Deployment:
graph TB
F["Frontend (Deployment)"] -- "Job Request" --> KM["Kubernetes Manager"]
KM -- "Create Job" --> K["Kubernetes Cluster"]
K -- "Run Job (Pod with ML script)" --> P["Pod with T4 GPU"]
F -- "Input File" --> P
P -- "Run ML Inference" --> R["Results"]
R -- "Return Results" --> F
- Frontend: Kubernetes Deployment
- Backend (Kubernetes Manager): Google Cloud Functions
- ML Inference: Kubernetes Job
- File communication: Google Cloud Buckets
Structure
- Architecture Diagram
Frontend
- frontend input
- [] Add File icon to input box once FASTA file has been inputted
- [] mock output for frontend
- [] probability distribution viz, protein 3d viz for frontend
- [] create new bucket with input files
- [] call Kubernetes Job Manager, include link to new bucket for application I/O
- [] upload preference inference data to viz
Inference
-
test running ESMFold on GCP instance
-
test running ESMFold with GPUs on GCP instance
-
ESM optimization using shardingnot worth it, running into issues loading the sharded model. I think the model is already sharded from HF since it's 15GB -
packaging ESM into a dockerfile
-
[] loading ESM dockerfile on GKE with reserved instances
-
[] saving files from ESM GKE
-
[] communicating files to frontend from GKE backend
-
[] ? emailing users of results
-
[] ESM masked probability task on jupyter notebook
-
[] add ESM masked probability to Docker image
Backend (Kubernetes Job Manager)
- [] read up on python kubernetes
- [] spin up test Kubernetes job manager