Out of vram and reboot #153

tdzz1102 · 2023-10-28T09:06:58Z

Machine info

Hardware
- 64GB RAM
- 32 vcpus
- 2*3090 GPUs
Software
- ubuntu server 20.04
- python 3.11.2
- nvidia-driver 545
- cuda 12.3
- jax 0.4.19

When I set up the environment and called FlaxWhisperPipline('openai/whisper-xxx')method to load the model, the server rebooted without any error. Only for 'openai/whisper-tiny' can it work correctly, and it will crush loading the 'openai/whisper-small' model and larger. I've tried XLA_PYTHON_CLIENT_PREALLOCATE=false mentioned in issue 7 but it didn't work.

The image below shows the vRAM usage of my machine. Missing data means machine rebooted.

Is there any way to prevent linux to reboot automatically when vRAM usage is high?

The text was updated successfully, but these errors were encountered:

sanchit-gandhi · 2023-12-15T11:40:59Z

Hey @tdzz1102 and sorry for the late reply! Could you try with XLA_PYTHON_CLIENT_MEM_FRACTION=.XX (where .XX is the percentage memory allocation you want to assign, e.g. .50 for 50% memory allocation) to reduce the memory fraction allocation? The docs say this should help combat OOMs that occur when the programme starts: https://jax.readthedocs.io/en/latest/gpu_memory_allocation.html

You might need to play with your value of .XX, e.g. incrementally reducing from .75 to .00

tdzz1102 · 2023-12-16T05:39:45Z

@sanchit-gandhi I have solved this question by downgrading the nividia-driver and cuda version(but forgot what exactly the version was 😢). Now the server has been expired and I can't try this solution anylonger. The faster whisper has helped me a lot, and thank you anyway!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Out of vram and reboot #153

Out of vram and reboot #153

tdzz1102 commented Oct 28, 2023 •

edited

sanchit-gandhi commented Dec 15, 2023

tdzz1102 commented Dec 16, 2023

Out of vram and reboot #153

Out of vram and reboot #153

Comments

tdzz1102 commented Oct 28, 2023 • edited

sanchit-gandhi commented Dec 15, 2023

tdzz1102 commented Dec 16, 2023

tdzz1102 commented Oct 28, 2023 •

edited