LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
llama
cuda-kernels
deepspeed
llm
fastertransformer
llm-inference
turbomind
internlm
llama2
codellama
llama3
-
Updated
May 20, 2024 - Python
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
Add a description, image, and links to the turbomind topic page so that developers can more easily learn about it.
To associate your repository with the turbomind topic, visit your repo's landing page and select "manage topics."