Idle timeout for releasing GPU memory #1736

MalikKillian · 2024-03-28T04:21:13Z

Please describe the feature you want

I use my computer for running Stable Diffusion (ComfyUI) as well as Tabby. I eventually noticed that ComfyUI would run very slowly when Tabby was also running. It appears that Tabby reserves GPU memory and only releases it when the [Docker] container is stopped (not paused). It's a little inconvenient to have to stop Tabby completely to use my GPU for other things.

Tabby should implement a configurable idle-timeout. After X seconds Tabby will release its memory and only reserve it again when a new request comes in. From simple observation it doesn't seem like it takes more than a few seconds to start a TabbyML container and receive a response so I kind of wonder if there's any purpose at all to keeping memory reserved.

Additional context

Tabby version: 0.9.1

Model: StarCoder-3B

Docker Desktop (Windows) version 4.19.0 (106363)

Output from nvidia-smi while Tabby container is idle (no requests for >60 seconds):

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.40.06              Driver Version: 551.23         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 2060 ...    On  |   00000000:2B:00.0  On |                  N/A |
| 29%   31C    P8             17W /  175W |    4395MiB /   8192MiB |      1%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A         1      C   /tabby                                      N/A      |
|    0   N/A  N/A        32      G   /Xwayland                                   N/A      |
|    0   N/A  N/A        35      G   /Xwayland                                   N/A      |
|    0   N/A  N/A        38      G   /Xwayland                                   N/A      |
+-----------------------------------------------------------------------------------------+

Output from nvidia-smi while Tabby container is stopped:

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.40.06              Driver Version: 551.23         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 2060 ...    On  |   00000000:2B:00.0  On |                  N/A |
| 29%   30C    P8             12W /  175W |     960MiB /   8192MiB |      6%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A        32      G   /Xwayland                                   N/A      |
|    0   N/A  N/A        35      G   /Xwayland                                   N/A      |
|    0   N/A  N/A        38      G   /Xwayland                                   N/A      |
+-----------------------------------------------------------------------------------------+

Please reply with a 👍 if you want this feature.

The text was updated successfully, but these errors were encountered:

wsxiaoys · 2024-03-29T14:43:58Z

See an prototype implementation here: #624 (comment)

MalikKillian · 2024-05-20T15:07:30Z

@wsxiaoys Not sure how I'm supposed to leverage this implementation. It doesn't seem to be the default behavior using the Docker container. As far as I can tell it's not even part of the image. 😞

MalikKillian added the enhancement New feature or request label Mar 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Idle timeout for releasing GPU memory #1736

Idle timeout for releasing GPU memory #1736

MalikKillian commented Mar 28, 2024

wsxiaoys commented Mar 29, 2024

MalikKillian commented May 20, 2024

Idle timeout for releasing GPU memory #1736

Idle timeout for releasing GPU memory #1736

Comments

MalikKillian commented Mar 28, 2024

wsxiaoys commented Mar 29, 2024

MalikKillian commented May 20, 2024