-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
taking long time to give response (around 2 min) #1896
Comments
below is the log if it may help.
|
The logs don't mention using the GPU, which is probably why it's slow. Something wrong with the llamacpp_python installation. |
Hello
should i uninstall or get some different version? this is the issue
|
now I did the new installation with
below is the complete log error
(.myh2o) C:\Users\Public\h2ogpt_Nov24> |
some how it is working when reinstall VS builder tool. |
Hello
I am running in the following machine.
CPU: 12th Gen Intel(R) Core(TM) i7-12700
RAM: 32GB, speed: 4400MT/s
NVIDIA RTX A2000 12GB
model is:
llama-2-7b-chat.Q6_K.gguf
And it takes around 2 min to start giving a response.
is it reasonable or it should be faster?
bat command to start the bot
While running idle
it is taking 7GB GPU memory (remains same when running the query)
24.4GB RAM (remains same when running the query)
CPU utilization stays 2 to 3%
When running the query CPU utilization goes closer to 100%
GPU remains 1% to 2%
and it takes around 2 min to start giving a response.
It seems it is not utilizing GPU at all.
could you please see what i am doing wrong here?
I want to get faster response
cuda version is
below is my pip list
The text was updated successfully, but these errors were encountered: