Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Idle timeout for releasing GPU memory #1736

Open
MalikKillian opened this issue Mar 28, 2024 · 2 comments
Open

Idle timeout for releasing GPU memory #1736

MalikKillian opened this issue Mar 28, 2024 · 2 comments
Labels
enhancement New feature or request

Comments

@MalikKillian
Copy link

Please describe the feature you want

I use my computer for running Stable Diffusion (ComfyUI) as well as Tabby. I eventually noticed that ComfyUI would run very slowly when Tabby was also running. It appears that Tabby reserves GPU memory and only releases it when the [Docker] container is stopped (not paused). It's a little inconvenient to have to stop Tabby completely to use my GPU for other things.

Tabby should implement a configurable idle-timeout. After X seconds Tabby will release its memory and only reserve it again when a new request comes in. From simple observation it doesn't seem like it takes more than a few seconds to start a TabbyML container and receive a response so I kind of wonder if there's any purpose at all to keeping memory reserved.

Additional context

Tabby version: 0.9.1

Model: StarCoder-3B

Docker Desktop (Windows) version 4.19.0 (106363)

Output from nvidia-smi while Tabby container is idle (no requests for >60 seconds):

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.40.06              Driver Version: 551.23         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 2060 ...    On  |   00000000:2B:00.0  On |                  N/A |
| 29%   31C    P8             17W /  175W |    4395MiB /   8192MiB |      1%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A         1      C   /tabby                                      N/A      |
|    0   N/A  N/A        32      G   /Xwayland                                   N/A      |
|    0   N/A  N/A        35      G   /Xwayland                                   N/A      |
|    0   N/A  N/A        38      G   /Xwayland                                   N/A      |
+-----------------------------------------------------------------------------------------+

Output from nvidia-smi while Tabby container is stopped:

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.40.06              Driver Version: 551.23         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 2060 ...    On  |   00000000:2B:00.0  On |                  N/A |
| 29%   30C    P8             12W /  175W |     960MiB /   8192MiB |      6%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A        32      G   /Xwayland                                   N/A      |
|    0   N/A  N/A        35      G   /Xwayland                                   N/A      |
|    0   N/A  N/A        38      G   /Xwayland                                   N/A      |
+-----------------------------------------------------------------------------------------+

Please reply with a 👍 if you want this feature.

@MalikKillian MalikKillian added the enhancement New feature or request label Mar 28, 2024
@wsxiaoys
Copy link
Member

See an prototype implementation here: #624 (comment)

@MalikKillian
Copy link
Author

@wsxiaoys Not sure how I'm supposed to leverage this implementation. It doesn't seem to be the default behavior using the Docker container. As far as I can tell it's not even part of the image. 😞

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants