Colab dreambooth notebook fail #252

andrewssdd · 2023-12-15T02:51:07Z

Describe the bug

The Dreambooth Colab notebook fails at the training stage. Seems to be an issue with bitsandbytes.

Reproduction

Run the Dreambooth Colab notebook. It fails at training.

https://colab.research.google.com/github/ShivamShrirao/diffusers/blob/main/examples/dreambooth/DreamBooth_Stable_Diffusion.ipynb

Exception: CUDA SETUP: Setup Failed!

Logs

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
For effortless bug reporting copy-paste your error into this form: https://docs.google.com/forms/d/e/1FAIpQLScPB8emS3Thkp66nvqwmjTEgxp8Y9ufuWTzFyr9kJ5AoI47dQ/viewform?usp=sf_link
================================================================================
/usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/paths.py:105: UserWarning: /usr/lib64-nvidia did not contain libcudart.so as expected! Searching further paths...
  warn(
/usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/paths.py:27: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/sys/fs/cgroup/memory.events /var/colab/cgroup/jupyter-children/memory.events')}
  warn(
/usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/paths.py:27: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('8013'), PosixPath('//172.28.0.1'), PosixPath('http')}
  warn(
/usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/paths.py:27: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('//colab.research.google.com/tun/m/cc48301118ce562b961b3c22d803539adc1e0c19/gpu-t4-s-3a2kk3hhilbsk --tunnel_background_save_delay=10s --tunnel_periodic_background_save_frequency=30m0s --enable_output_coalescing=true --output_coalescing_required=true'), PosixPath('--logtostderr --listen_host=172.28.0.12 --target_host=172.28.0.12 --tunnel_background_save_url=https')}
  warn(
/usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/paths.py:27: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/datalab/web/pyright/typeshed-fallback/stdlib,/usr/local/lib/python3.10/dist-packages')}
  warn(
/usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/paths.py:27: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/env/python')}
  warn(
/usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/paths.py:27: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('//ipykernel.pylab.backend_inline'), PosixPath('module')}
  warn(
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 7.5
CUDA SETUP: Detected CUDA version 122
CUDA SETUP: TODO: compile library for specific version: libbitsandbytes_cuda122.so
CUDA SETUP: Defaulting to libbitsandbytes.so...
CUDA SETUP: CUDA detection failed. Either CUDA driver not installed, CUDA not installed, or you have multiple conflicting CUDA libraries!
CUDA SETUP: If you compiled from source, try again with `make CUDA_VERSION=DETECTED_CUDA_VERSION` for example, `make CUDA_VERSION=113`.
Traceback (most recent call last):
  File "/content/train_dreambooth.py", line 869, in <module>
    main(args)
  File "/content/train_dreambooth.py", line 571, in main
    import bitsandbytes as bnb
  File "/usr/local/lib/python3.10/dist-packages/bitsandbytes/__init__.py", line 6, in <module>
    from .autograd._functions import (
  File "/usr/local/lib/python3.10/dist-packages/bitsandbytes/autograd/_functions.py", line 5, in <module>
    import bitsandbytes.functional as F
  File "/usr/local/lib/python3.10/dist-packages/bitsandbytes/functional.py", line 13, in <module>
    from .cextension import COMPILED_WITH_CUDA, lib
  File "/usr/local/lib/python3.10/dist-packages/bitsandbytes/cextension.py", line 41, in <module>
    lib = CUDALibrary_Singleton.get_instance().lib
  File "/usr/local/lib/python3.10/dist-packages/bitsandbytes/cextension.py", line 37, in get_instance
    cls._instance.initialize()
  File "/usr/local/lib/python3.10/dist-packages/bitsandbytes/cextension.py", line 27, in initialize
    raise Exception('CUDA SETUP: Setup Failed!')
Exception: CUDA SETUP: Setup Failed!

System Info

Google Colab

andrewssdd · 2023-12-15T04:25:01Z

Here's the requirement cell that works.

!wget -q https://github.com/ShivamShrirao/diffusers/raw/main/examples/dreambooth/train_dreambooth.py
!wget -q https://github.com/ShivamShrirao/diffusers/raw/main/scripts/convert_diffusers_to_original_stable_diffusion.py
%pip install  git+https://github.com/ShivamShrirao/diffusers
%pip install  -U --pre triton
%pip install  transformers ftfy bitsandbytes gradio natsort safetensors xformers torch==2.1.0+cu121 accelerate

VladAdushev · 2023-12-17T09:30:52Z

Here's the requirement cell that works.

returns an error

TragicXxBoNeSxX · 2023-12-20T22:16:34Z

Still returns "Exception: CUDA SETUP: Setup Failed!"

vadar007 · 2023-12-21T09:36:28Z

Was able to successfully execute with modified code. However, when attempting to use the generated model with Stable Diffusion was getting the following error:

*** Error verifying pickled file from D:\l......*.ckpt
*** The file may be malicious, so the program is not going to read it.
*** You can skip this check with --disable-safe-unpickle commandline argument.

Adding recommended command line argument allowed Stable Diffusion to utilize the model

TragicXxBoNeSxX · 2023-12-21T17:49:00Z

Changing this line: %pip install transformers ftfy bitsandbytes gradio natsort safetensors xformers torch==2.1.0+cu121 accelerate

To this: %pip install transformers ftfy bitsandbytes gradio natsort safetensors xformers torch==2.1.0+cu121 accelerate kaleido cohere openai tiktoken

Got it working for me again.

andrewssdd · 2023-12-22T04:18:39Z

Was able to successfully execute with modified code. However, when attempting to use the generated model with Stable Diffusion was getting the following error:

*** Error verifying pickled file from D:\l......*.ckpt *** The file may be malicious, so the program is not going to read it. *** You can skip this check with --disable-safe-unpickle commandline argument.

Adding recommended command line argument allowed Stable Diffusion to utilize the model

You need to save the checkpoint as safetensors

Al-Rien · 2023-12-22T18:22:57Z

To this: %pip install transformers ftfy bitsandbytes gradio natsort safetensors xformers torch==2.1.0+cu121 accelerate kaleido cohere openai tiktoken

Still returns CUDA SETUP: Setup Failed.

TianyiPeng · 2023-12-25T07:13:43Z

Does anyone get it to work now?

domingosl · 2023-12-28T20:59:21Z

Same issue here after trying all suggestions, CUDA SETUP: Setup Failed

abc123desygn · 2023-12-31T16:10:43Z

I have the same issue. Can you please fix?

tibor · 2024-01-02T21:29:52Z

I’m not sure but I think this is a problem with the library bitsandbytes. I have opened a ticket here: bitsandbytes-foundation/bitsandbytes#950

VladAdushev · 2024-01-11T17:51:26Z

Has anyone managed to launch it?

deveshruttala · 2024-01-12T08:51:52Z

same issue with me

tibor · 2024-01-16T00:44:25Z

Yeah, it works with the fix at bitsandbytes-foundation/bitsandbytes#950

Kategus · 2024-02-05T08:55:51Z

Good day. Broke down again. If someone has a working version, please send it.

jackiter · 2024-02-15T01:19:15Z

Good day. Broke down again. If someone has a working version, please send it.

Yes please

chchchadzilla · 2024-03-03T14:38:45Z

I'd even settle for someone to just explain to me why it's broken so i can try and fix it. I've got it to work several times by installing different versions of torch with cuda, xformers, triton, and torchtext, torchaudio, torchvision, and torchdata, as well as gotten it to work by installing kaleido, pycairo, tiktoken, and openai--- but the problem is I was just throwing shit at a wall and hoping it sticks since I fundamentally don't understand what's happening and happened to hit pay dirt, so replicating it has proven difficult. Impossible, actually, in the last week specifically. Not sure if another update screwed the pooch on another module, but it's frustrating. I've tried to learn kohya_ss and I'm very, very bad at it, and regardless of following tutorials, it never works or maybe I'm just stupid. Either way, there's no user-friendly (in the lowest sense of the word) choice except this shivram colab, which in an of itself took an ungodly amount of trial and error to get it how it makes sense to me and for me getting good results. Now, though, it seems like no one gives a crap because it's outdated technology, and training LoRAs and now Stable Cascade and with stable diffusion 3 right around the corner for a public release, I'm afraid we won't see a fix. It's just frustrating as someone who does this as a hobbyist and not professionally, all the talking about it doesn't provide clearly defined, easy to follow solutions. It's all assumptive and predicated on you knowing what everyone is talking about, and not a step-by-step idiot-proof type of guide, which I feel like so many of us need to get good results but are too embarrassed or feel like we'll be made fun of or reprimanded somehow if we ask stupid questions. The whole thing is elitist, and it doesn't do anyone any good. It turns regular quazi-nerds like myself off from diving into this world head first, which you never know, you could be turning someone off to the whole thing that was the next visionary that would've written code or developed something that could have changed the game. That's a long shot, but, I think you get my point. It's frustrating that there's no information on this, and the solutions that are out there, are half-assed, typed out with the assumption that you're already a python developer and we're not. We're regular dudes who use this for fun, hobbies, and some of us used to make money off training models for people, or to get work done for our day jobs. And, look, I get it-- it's forced evolution, right? Figure it the eff out, or stop complaining and stop using it. But, for something that seems like it should be so damn easy to fix, I just don't understand the lack of anyone even wanting to try and help. It's disheartening. Sorry for the rant, tonight has been really frustrating and I'm no closer to getting pending work done-- I've got new characters to train into a model that'll let me finish up a comic book series I've had to back-burner for the last 3 months because of this, and I promise if someone helps me fix it I'll never use the damn software again and stop bugging everyone. Thanks.

Kategus · 2024-03-04T15:27:44Z

Bravo, great speech. But I'm afraid it won't bear positive fruit. Personally, in my opinion, this is not an outdated technology, it's just that in addition to Lora, Dreambooth technology gives very good results in terms of similarity. So far, I have not seen the same similarity in the new products. It seems that the guy was just cut off the Internet or taken to the army).

Olivier-aka-Raiden · 2024-04-04T14:13:06Z

If anyone interested, I successfully trained my model by installing requirements as follows:
%pip install transformers ftfy bitsandbytes gradio natsort safetensors xformers torch==2.2.1 accelerate kaleido cohere openai tiktoken

Kategus · 2024-04-09T07:01:40Z

If anyone interested, I successfully trained my model by installing requirements as follows: %pip install transformers ftfy bitsandbytes gradio natsort safetensors xformers torch==2.2.1 accelerate kaleido cohere openai tiktoken

Thanks for the hint! But this option did not last long, it gives an error again. Maybe there will be masters and fix it?

Baconwrappedfriedpickles · 2024-04-11T02:26:05Z

Thanks for the hint! But this option did not last long, it gives an error again. Maybe there will be masters and fix it?

Someone on another site suggested adding this to the requirements and it's working for me. Hope it helps.
%pip install "jax[cuda12_local]==0.4.23" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html

andrewssdd added the bug Something isn't working label Dec 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Colab dreambooth notebook fail #252

Colab dreambooth notebook fail #252

andrewssdd commented Dec 15, 2023

andrewssdd commented Dec 15, 2023

VladAdushev commented Dec 17, 2023

TragicXxBoNeSxX commented Dec 20, 2023

vadar007 commented Dec 21, 2023

TragicXxBoNeSxX commented Dec 21, 2023

andrewssdd commented Dec 22, 2023

Al-Rien commented Dec 22, 2023

TianyiPeng commented Dec 25, 2023

domingosl commented Dec 28, 2023

abc123desygn commented Dec 31, 2023

tibor commented Jan 2, 2024

VladAdushev commented Jan 11, 2024

deveshruttala commented Jan 12, 2024

tibor commented Jan 16, 2024

Kategus commented Feb 5, 2024

jackiter commented Feb 15, 2024

chchchadzilla commented Mar 3, 2024

Kategus commented Mar 4, 2024

Olivier-aka-Raiden commented Apr 4, 2024

Kategus commented Apr 9, 2024

Baconwrappedfriedpickles commented Apr 11, 2024

Colab dreambooth notebook fail #252

Colab dreambooth notebook fail #252

Comments

andrewssdd commented Dec 15, 2023

Describe the bug

Reproduction

Logs

System Info

andrewssdd commented Dec 15, 2023

VladAdushev commented Dec 17, 2023

TragicXxBoNeSxX commented Dec 20, 2023

vadar007 commented Dec 21, 2023

TragicXxBoNeSxX commented Dec 21, 2023

andrewssdd commented Dec 22, 2023

Al-Rien commented Dec 22, 2023

TianyiPeng commented Dec 25, 2023

domingosl commented Dec 28, 2023

abc123desygn commented Dec 31, 2023

tibor commented Jan 2, 2024

VladAdushev commented Jan 11, 2024

deveshruttala commented Jan 12, 2024

tibor commented Jan 16, 2024

Kategus commented Feb 5, 2024

jackiter commented Feb 15, 2024

chchchadzilla commented Mar 3, 2024

Kategus commented Mar 4, 2024

Olivier-aka-Raiden commented Apr 4, 2024

Kategus commented Apr 9, 2024

Baconwrappedfriedpickles commented Apr 11, 2024