-
Notifications
You must be signed in to change notification settings - Fork 736
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Idle timeout for releasing GPU memory #1736
Labels
enhancement
New feature or request
Comments
See an prototype implementation here: #624 (comment) |
@wsxiaoys Not sure how I'm supposed to leverage this implementation. It doesn't seem to be the default behavior using the Docker container. As far as I can tell it's not even part of the image. 😞 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Please describe the feature you want
I use my computer for running Stable Diffusion (ComfyUI) as well as Tabby. I eventually noticed that ComfyUI would run very slowly when Tabby was also running. It appears that Tabby reserves GPU memory and only releases it when the [Docker] container is stopped (not paused). It's a little inconvenient to have to stop Tabby completely to use my GPU for other things.
Tabby should implement a configurable idle-timeout. After X seconds Tabby will release its memory and only reserve it again when a new request comes in. From simple observation it doesn't seem like it takes more than a few seconds to start a TabbyML container and receive a response so I kind of wonder if there's any purpose at all to keeping memory reserved.
Additional context
Tabby version:
0.9.1
Model:
StarCoder-3B
Docker Desktop (Windows) version
4.19.0 (106363)
Output from
nvidia-smi
while Tabby container is idle (no requests for >60 seconds):Output from
nvidia-smi
while Tabby container is stopped:Please reply with a 👍 if you want this feature.
The text was updated successfully, but these errors were encountered: