Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Colab dreambooth notebook fail #252

Open
andrewssdd opened this issue Dec 15, 2023 · 21 comments
Open

Colab dreambooth notebook fail #252

andrewssdd opened this issue Dec 15, 2023 · 21 comments
Labels
bug Something isn't working

Comments

@andrewssdd
Copy link

Describe the bug

The Dreambooth Colab notebook fails at the training stage. Seems to be an issue with bitsandbytes.

Reproduction

Run the Dreambooth Colab notebook. It fails at training.

https://colab.research.google.com/github/ShivamShrirao/diffusers/blob/main/examples/dreambooth/DreamBooth_Stable_Diffusion.ipynb

Exception: CUDA SETUP: Setup Failed!

Logs

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
For effortless bug reporting copy-paste your error into this form: https://docs.google.com/forms/d/e/1FAIpQLScPB8emS3Thkp66nvqwmjTEgxp8Y9ufuWTzFyr9kJ5AoI47dQ/viewform?usp=sf_link
================================================================================
/usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/paths.py:105: UserWarning: /usr/lib64-nvidia did not contain libcudart.so as expected! Searching further paths...
  warn(
/usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/paths.py:27: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/sys/fs/cgroup/memory.events /var/colab/cgroup/jupyter-children/memory.events')}
  warn(
/usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/paths.py:27: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('8013'), PosixPath('//172.28.0.1'), PosixPath('http')}
  warn(
/usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/paths.py:27: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('//colab.research.google.com/tun/m/cc48301118ce562b961b3c22d803539adc1e0c19/gpu-t4-s-3a2kk3hhilbsk --tunnel_background_save_delay=10s --tunnel_periodic_background_save_frequency=30m0s --enable_output_coalescing=true --output_coalescing_required=true'), PosixPath('--logtostderr --listen_host=172.28.0.12 --target_host=172.28.0.12 --tunnel_background_save_url=https')}
  warn(
/usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/paths.py:27: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/datalab/web/pyright/typeshed-fallback/stdlib,/usr/local/lib/python3.10/dist-packages')}
  warn(
/usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/paths.py:27: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/env/python')}
  warn(
/usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/paths.py:27: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('//ipykernel.pylab.backend_inline'), PosixPath('module')}
  warn(
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 7.5
CUDA SETUP: Detected CUDA version 122
CUDA SETUP: TODO: compile library for specific version: libbitsandbytes_cuda122.so
CUDA SETUP: Defaulting to libbitsandbytes.so...
CUDA SETUP: CUDA detection failed. Either CUDA driver not installed, CUDA not installed, or you have multiple conflicting CUDA libraries!
CUDA SETUP: If you compiled from source, try again with `make CUDA_VERSION=DETECTED_CUDA_VERSION` for example, `make CUDA_VERSION=113`.
Traceback (most recent call last):
  File "/content/train_dreambooth.py", line 869, in <module>
    main(args)
  File "/content/train_dreambooth.py", line 571, in main
    import bitsandbytes as bnb
  File "/usr/local/lib/python3.10/dist-packages/bitsandbytes/__init__.py", line 6, in <module>
    from .autograd._functions import (
  File "/usr/local/lib/python3.10/dist-packages/bitsandbytes/autograd/_functions.py", line 5, in <module>
    import bitsandbytes.functional as F
  File "/usr/local/lib/python3.10/dist-packages/bitsandbytes/functional.py", line 13, in <module>
    from .cextension import COMPILED_WITH_CUDA, lib
  File "/usr/local/lib/python3.10/dist-packages/bitsandbytes/cextension.py", line 41, in <module>
    lib = CUDALibrary_Singleton.get_instance().lib
  File "/usr/local/lib/python3.10/dist-packages/bitsandbytes/cextension.py", line 37, in get_instance
    cls._instance.initialize()
  File "/usr/local/lib/python3.10/dist-packages/bitsandbytes/cextension.py", line 27, in initialize
    raise Exception('CUDA SETUP: Setup Failed!')
Exception: CUDA SETUP: Setup Failed!

System Info

Google Colab

@andrewssdd andrewssdd added the bug Something isn't working label Dec 15, 2023
@andrewssdd
Copy link
Author

Here's the requirement cell that works.

!wget -q https://github.com/ShivamShrirao/diffusers/raw/main/examples/dreambooth/train_dreambooth.py
!wget -q https://github.com/ShivamShrirao/diffusers/raw/main/scripts/convert_diffusers_to_original_stable_diffusion.py
%pip install  git+https://github.com/ShivamShrirao/diffusers
%pip install  -U --pre triton
%pip install  transformers ftfy bitsandbytes gradio natsort safetensors xformers torch==2.1.0+cu121 accelerate

@VladAdushev
Copy link

Here's the requirement cell that works.

returns an error

@TragicXxBoNeSxX
Copy link

Still returns "Exception: CUDA SETUP: Setup Failed!"

@vadar007
Copy link

Was able to successfully execute with modified code. However, when attempting to use the generated model with Stable Diffusion was getting the following error:

*** Error verifying pickled file from D:\l......*.ckpt
*** The file may be malicious, so the program is not going to read it.
*** You can skip this check with --disable-safe-unpickle commandline argument.

Adding recommended command line argument allowed Stable Diffusion to utilize the model

@TragicXxBoNeSxX
Copy link

Changing this line: %pip install transformers ftfy bitsandbytes gradio natsort safetensors xformers torch==2.1.0+cu121 accelerate

To this: %pip install transformers ftfy bitsandbytes gradio natsort safetensors xformers torch==2.1.0+cu121 accelerate kaleido cohere openai tiktoken

Got it working for me again.

@andrewssdd
Copy link
Author

Was able to successfully execute with modified code. However, when attempting to use the generated model with Stable Diffusion was getting the following error:

*** Error verifying pickled file from D:\l......*.ckpt *** The file may be malicious, so the program is not going to read it. *** You can skip this check with --disable-safe-unpickle commandline argument.

Adding recommended command line argument allowed Stable Diffusion to utilize the model

You need to save the checkpoint as safetensors

@Al-Rien
Copy link

Al-Rien commented Dec 22, 2023

To this: %pip install transformers ftfy bitsandbytes gradio natsort safetensors xformers torch==2.1.0+cu121 accelerate kaleido cohere openai tiktoken

Still returns CUDA SETUP: Setup Failed.

@TianyiPeng
Copy link

Does anyone get it to work now?

@domingosl
Copy link

Same issue here after trying all suggestions, CUDA SETUP: Setup Failed

@abc123desygn
Copy link

I have the same issue. Can you please fix?

@tibor
Copy link

tibor commented Jan 2, 2024

I’m not sure but I think this is a problem with the library bitsandbytes. I have opened a ticket here: bitsandbytes-foundation/bitsandbytes#950

@VladAdushev
Copy link

Has anyone managed to launch it?

@deveshruttala
Copy link

same issue with me

@tibor
Copy link

tibor commented Jan 16, 2024

Yeah, it works with the fix at bitsandbytes-foundation/bitsandbytes#950

@Kategus
Copy link

Kategus commented Feb 5, 2024

Good day. Broke down again. If someone has a working version, please send it.

@jackiter
Copy link

Good day. Broke down again. If someone has a working version, please send it.

Yes please

@chchchadzilla
Copy link

I'd even settle for someone to just explain to me why it's broken so i can try and fix it. I've got it to work several times by installing different versions of torch with cuda, xformers, triton, and torchtext, torchaudio, torchvision, and torchdata, as well as gotten it to work by installing kaleido, pycairo, tiktoken, and openai--- but the problem is I was just throwing shit at a wall and hoping it sticks since I fundamentally don't understand what's happening and happened to hit pay dirt, so replicating it has proven difficult. Impossible, actually, in the last week specifically. Not sure if another update screwed the pooch on another module, but it's frustrating. I've tried to learn kohya_ss and I'm very, very bad at it, and regardless of following tutorials, it never works or maybe I'm just stupid. Either way, there's no user-friendly (in the lowest sense of the word) choice except this shivram colab, which in an of itself took an ungodly amount of trial and error to get it how it makes sense to me and for me getting good results. Now, though, it seems like no one gives a crap because it's outdated technology, and training LoRAs and now Stable Cascade and with stable diffusion 3 right around the corner for a public release, I'm afraid we won't see a fix. It's just frustrating as someone who does this as a hobbyist and not professionally, all the talking about it doesn't provide clearly defined, easy to follow solutions. It's all assumptive and predicated on you knowing what everyone is talking about, and not a step-by-step idiot-proof type of guide, which I feel like so many of us need to get good results but are too embarrassed or feel like we'll be made fun of or reprimanded somehow if we ask stupid questions. The whole thing is elitist, and it doesn't do anyone any good. It turns regular quazi-nerds like myself off from diving into this world head first, which you never know, you could be turning someone off to the whole thing that was the next visionary that would've written code or developed something that could have changed the game. That's a long shot, but, I think you get my point. It's frustrating that there's no information on this, and the solutions that are out there, are half-assed, typed out with the assumption that you're already a python developer and we're not. We're regular dudes who use this for fun, hobbies, and some of us used to make money off training models for people, or to get work done for our day jobs. And, look, I get it-- it's forced evolution, right? Figure it the eff out, or stop complaining and stop using it. But, for something that seems like it should be so damn easy to fix, I just don't understand the lack of anyone even wanting to try and help. It's disheartening. Sorry for the rant, tonight has been really frustrating and I'm no closer to getting pending work done-- I've got new characters to train into a model that'll let me finish up a comic book series I've had to back-burner for the last 3 months because of this, and I promise if someone helps me fix it I'll never use the damn software again and stop bugging everyone. Thanks.

@Kategus
Copy link

Kategus commented Mar 4, 2024

Bravo, great speech. But I'm afraid it won't bear positive fruit. Personally, in my opinion, this is not an outdated technology, it's just that in addition to Lora, Dreambooth technology gives very good results in terms of similarity. So far, I have not seen the same similarity in the new products. It seems that the guy was just cut off the Internet or taken to the army).

@Olivier-aka-Raiden
Copy link

If anyone interested, I successfully trained my model by installing requirements as follows:
%pip install transformers ftfy bitsandbytes gradio natsort safetensors xformers torch==2.2.1 accelerate kaleido cohere openai tiktoken

@Kategus
Copy link

Kategus commented Apr 9, 2024

If anyone interested, I successfully trained my model by installing requirements as follows: %pip install transformers ftfy bitsandbytes gradio natsort safetensors xformers torch==2.2.1 accelerate kaleido cohere openai tiktoken

Thanks for the hint! But this option did not last long, it gives an error again. Maybe there will be masters and fix it?

@Baconwrappedfriedpickles

Thanks for the hint! But this option did not last long, it gives an error again. Maybe there will be masters and fix it?

Someone on another site suggested adding this to the requirements and it's working for me. Hope it helps.
%pip install "jax[cuda12_local]==0.4.23" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests