Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VRAM requirement to load ControlNet for inference? #99

Open
yuchen1984 opened this issue Sep 4, 2024 · 6 comments
Open

VRAM requirement to load ControlNet for inference? #99

yuchen1984 opened this issue Sep 4, 2024 · 6 comments

Comments

@yuchen1984
Copy link

yuchen1984 commented Sep 4, 2024

I was trying to load XLabs-AI/flux-controlnet-depth-v3 for inference, using the checkpoint flux-dev-fp8 with the switch "offload". Image size 1024x512.

It still gives CUDA OOM on RTX4090 (24GB VRAM). What is the minimal VRAM requirement to load ControlNet for inference? Is there FP8 version of ControlNets or is there any caveat to get it work? It feels outrageous having to use A100 just for running inference.....

NB: without loading the ControlNet, the inference is possible with 24GB VRAM. The observed peak VRAM usage is just about 14GB

@Oguzhanercan
Copy link

Oguzhanercan commented Nov 21, 2024

Did you find a way to infer with 24GB vram? @yuchen1984

@yuchen1984
Copy link
Author

Did you find a way to infer with 24GB vram? @yuchen1984

Nope. I ended up hiring an A40 node on vast.ai by the time. The peak VRAM usage is about 27.5GB

@yuchen1984
Copy link
Author

Did you find a way to infer with 24GB vram? @yuchen1984

Actually it seems possible to make a bit of code change in xflux_pipeline.py so that ControlNet can be offloaded to cpu in the --lowvram mode. This will bring the peak VRAM below 24GB. I will create a PR a bit later

@yuchen1984
Copy link
Author

#138

@Oguzhanercan
Copy link

Oguzhanercan commented Nov 22, 2024

Thanks for your PR, I solved it via sequential offload, 2GB vram required, inference time doubled, how much this solution slow down the pipeline? (transformer quantized to nf4)

@yuchen1984
Copy link
Author

Thanks for your PR, I solved it via sequential offload, 2GB vram required, inference time doubled, how much this solution slow down the pipeline? (transformer quantized to nf4)

Slight slow-down but definitely not as much as sequential offload I believe. (of course will need a lot more than 2GB vram). I was running everything in fp8. Peak-vram is about 21GB

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants