-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VRAM requirement to load ControlNet for inference? #99
Comments
Did you find a way to infer with 24GB vram? @yuchen1984 |
Nope. I ended up hiring an A40 node on vast.ai by the time. The peak VRAM usage is about 27.5GB |
Actually it seems possible to make a bit of code change in xflux_pipeline.py so that ControlNet can be offloaded to cpu in the --lowvram mode. This will bring the peak VRAM below 24GB. I will create a PR a bit later |
Thanks for your PR, I solved it via sequential offload, 2GB vram required, inference time doubled, how much this solution slow down the pipeline? (transformer quantized to nf4) |
Slight slow-down but definitely not as much as sequential offload I believe. (of course will need a lot more than 2GB vram). I was running everything in fp8. Peak-vram is about 21GB |
I was trying to load XLabs-AI/flux-controlnet-depth-v3 for inference, using the checkpoint flux-dev-fp8 with the switch "offload". Image size 1024x512.
It still gives CUDA OOM on RTX4090 (24GB VRAM). What is the minimal VRAM requirement to load ControlNet for inference? Is there FP8 version of ControlNets or is there any caveat to get it work? It feels outrageous having to use A100 just for running inference.....
NB: without loading the ControlNet, the inference is possible with 24GB VRAM. The observed peak VRAM usage is just about 14GB
The text was updated successfully, but these errors were encountered: