-
Notifications
You must be signed in to change notification settings - Fork 239
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: For issue #706 Ray serve with Llama.cpp for CPU inference on Graviton #739
base: main
Are you sure you want to change the base?
Conversation
…raviton ray-service-llamacpp.yaml -- Ray service yaml file llamacpp-serve.py -- Ray serve python class with llama-cpp-python bind perf_benchmark.go -- Benchmakr script with go routine prompts.txt -- prompts example
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First off, thank you so much for adding this, it's a great example of using a different tool on a new instance type, I think it's going to be a great addition!
I left a few comments about formatting and cleanup which will make reviewing this PR a lot easier. I'd also like to remove things like pulling from other repos or building docker images if we can help it. Let's get those addressed and we can get another round through.
num_cpus: 29 | ||
runtime_env: | ||
working_dir: "https://github.com/ddynwzh1992/ray-llm/archive/refs/heads/main.zip" | ||
pip: ["llama_cpp_python", "transformers==4.46.0"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please freeze this dependency
ray_actor_options: | ||
num_cpus: 29 | ||
runtime_env: | ||
working_dir: "https://github.com/ddynwzh1992/ray-llm/archive/refs/heads/main.zip" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of doing this, please create a configmap of the llamacpp-serve.py
file and add it to the head node pod for deployment. Please take a look at this PR for an example: https://github.com/awslabs/data-on-eks/pull/607/files
rayClusterConfig: | ||
rayVersion: '2.33.0' | ||
enableInTreeAutoscaling: true | ||
#rayVersion: 3.0.0.dev0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove any commented out code
"io/ioutil" | ||
"net/http" | ||
"strings" | ||
"os" // Add this import |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please format this file
# Get host CPU count | ||
host_cpu_count = multiprocessing.cpu_count() | ||
|
||
model = LLamaCPPDeployment.bind("host_cpu_count") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
newline
@@ -0,0 +1,102 @@ | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please remove leading whitespace
What does this PR do?
🛑 Please open an issue first to discuss any significant work and flesh out details/direction. When we triage the issues, we will add labels to the issue like "Enhancement", "Bug" which should indicate to you that this issue can be worked on and we are looking forward to your PR. We would hate for your time to be wasted.
Consult the CONTRIBUTING guide for submitting pull-requests.
Add a ML blueprint to support ray serve with llama.cpp framework for model inference on AWS Graviton
Including following stuffs
ray-service-llamacpp.yaml -- create a Ray service
llamacpp-serve.py -- Ray serve python class with llama-cpp-python bind
perf_benchmark.go -- Benchmakr script with go routine
prompts.txt -- prompts example
Motivation
Contribute to GenAI on EKS
More
website/docs
orwebsite/blog
section for this featurepre-commit run -a
with this PR. Link for installing pre-commit locallyFor Moderators
Additional Notes