Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The GPU memory usage is too high #3

Open
peki12345 opened this issue Sep 13, 2024 · 10 comments
Open

The GPU memory usage is too high #3

peki12345 opened this issue Sep 13, 2024 · 10 comments

Comments

@peki12345
Copy link

About 60G? This is too scary, can it be optimized?

@8600862
Copy link

8600862 commented Sep 19, 2024

torch.cuda.OutOfMemoryError: CUDA out of memory.

@microbenh
Copy link

How to run pipeline in several GPUs, like 4*4090

@JPlin
Copy link
Collaborator

JPlin commented Sep 23, 2024

@c-steve-wang
Copy link

You can try this to in two GPUs: https://huggingface2.notion.site/How-to-split-Flux-transformer-and-run-inference-aa1583ad23ce47a78589a79bb9309ab0

Could you kindly provide the script of how this method works with the main.py?

@microbenh
Copy link

You can try this to in two GPUs: https://huggingface2.notion.site/How-to-split-Flux-transformer-and-run-inference-aa1583ad23ce47a78589a79bb9309ab0

It do not work. The transformer need 24GB, and controlnet 4GB, they have to be the same gpu.

@Nomination-NRB
Copy link

You can try this to in two GPUs: https://huggingface2.notion.site/How-to-split-Flux-transformer-and-run-inference-aa1583ad23ce47a78589a79bb9309ab0

It do not work. The transformer need 24GB, and controlnet 4GB, they have to be the same gpu.

It worked, I used two 3090s to get results, but inpaint was poor, and poorly followed prompt to redraw

@JPlin
Copy link
Collaborator

JPlin commented Nov 4, 2024

#27 Fixed some bugs, now need 28GB of VRAM.

@xhinker
Copy link

xhinker commented Nov 28, 2024

I can successfully run it using torchao using about 20G VRAM, the result is great

@shantzhou
Copy link

I can successfully run it using torchao using about 20G VRAM, the result is great

can you share the demo with me?

@timmerscher
Copy link

timmerscher commented Dec 6, 2024

I can successfully run it using torchao using about 20G VRAM, the result is great

How? I'm trying quantization with bitsAndBytes, but the it seems that the gradient for the transformer is required, making the int8 version unusable!
I know time is precious, but I would really appreciate a guidance!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants