Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LoHa/LoKr with conv. Error with tensor size mismatch #182

Closed
rockerBOO opened this issue May 15, 2024 · 5 comments
Closed

LoHa/LoKr with conv. Error with tensor size mismatch #182

rockerBOO opened this issue May 15, 2024 · 5 comments

Comments

@rockerBOO
Copy link
Contributor

rockerBOO commented May 15, 2024

Was testing conv models and did one with LoHa and errored about a size mismatch. Using Kohya.

Commit: daa559f on dev branch

Relevant parts of the config:

pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5"
mixed_precision="fp16"
sdpa=true
network_dim=16
network_alpha=8
network_module = "lycoris.kohya"
network_args=[ 
  "algo=loha",
  "preset=unet-convblock-only",
  # "preset=unet-transformer-only", # Comparision without conv
  "dora_wd=true", # tested without and same error
  "rs_lora=true", # tested without and same error
  "dropout=0.5",
  "rank_dropout=0.25",
  "module_dropout=0.25"
]
Traceback (most recent call last):
  File "/mnt/900/builds/sd-scripts/train_network.py", line 1154, in <module>
    trainer.train(args)
  File "/mnt/900/builds/sd-scripts/train_network.py", line 896, in train
    noise_pred = self.call_unet(
  File "/mnt/900/builds/sd-scripts/train_network.py", line 126, in call_unet
    noise_pred = unet(noisy_latents, timesteps, text_conds).sample
  File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/accelerate/utils/operations.py", line 822, in forward
    return model_forward(*args, **kwargs)
  File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/accelerate/utils/operations.py", line 810, in __call__
    return convert_to_fp32(self.model_forward(*args, **kwargs))
  File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast
    return func(*args, **kwargs)
  File "/mnt/900/builds/sd-scripts/library/original_unet.py", line 1589, in forward
    sample, res_samples = downsample_block(
  File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/900/builds/sd-scripts/library/original_unet.py", line 1018, in forward
    hidden_states = torch.utils.checkpoint.checkpoint(create_custom_forward(resnet), hidden_states, temb)
  File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/_compile.py", line 24, in inner
    return torch._dynamo.disable(fn, recursive)(*args, **kwargs)
  File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 489, in _fn
    return fn(*args, **kwargs)
  File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/_dynamo/external_utils.py", line 17, in inner
    return fn(*args, **kwargs)
  File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 482, in checkpoint
    return CheckpointFunction.apply(function, preserve, *args)
  File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/autograd/function.py", line 553, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 261, in forward
    outputs = run_function(*args)
  File "/mnt/900/builds/sd-scripts/library/original_unet.py", line 1014, in custom_forward
    return module(*inputs)
  File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/900/builds/sd-scripts/library/original_unet.py", line 481, in forward
    output_tensor = input_tensor + hidden_states
RuntimeError: The size of tensor a (20) must match the size of tensor b (16) at non-singleton dimension 3

Not a big deal as I'm generally not doing this beyond making a conv version of LoHa for analysis purposes.

Thank you!

@rockerBOO
Copy link
Contributor Author

Same error type on LoKr, it seems.

pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5"
mixed_precision="fp16"
sdpa=true
network_dim=16
network_alpha=8
network_module = "lycoris.kohya"
network_args=[ 
  "algo=lokr",
  "preset=unet-convblock-only",
  # "preset=unet-transformer-only", # Comparision without conv
  "dora_wd=true", # tested without and same error
  "rs_lora=true", # tested without and same error
  "dropout=0.5",
  "rank_dropout=0.25",
  "module_dropout=0.25"
]
Traceback (most recent call last):
  File "/mnt/900/builds/sd-scripts/train_network.py", line 1115, in <module>
    trainer.train(args)
  File "/mnt/900/builds/sd-scripts/train_network.py", line 864, in train
    noise_pred = self.call_unet(
  File "/mnt/900/builds/sd-scripts/train_network.py", line 126, in call_unet
    noise_pred = unet(noisy_latents, timesteps, text_conds).sample
  File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/accelerate/utils/operations.py", line 822, in forward
    return model_forward(*args, **kwargs)
  File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/accelerate/utils/operations.py", line 810, in __call__
    return convert_to_fp32(self.model_forward(*args, **kwargs))
  File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast
    return func(*args, **kwargs)
  File "/mnt/900/builds/sd-scripts/library/original_unet.py", line 1589, in forward
    sample, res_samples = downsample_block(
  File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/900/builds/sd-scripts/library/original_unet.py", line 1018, in forward
    hidden_states = torch.utils.checkpoint.checkpoint(create_custom_forward(resnet), hidden_states, temb)
  File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/_compile.py", line 24, in inner
    return torch._dynamo.disable(fn, recursive)(*args, **kwargs)
  File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 489, in _fn
    return fn(*args, **kwargs)
  File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/_dynamo/external_utils.py", line 17, in inner
    return fn(*args, **kwargs)
  File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 482, in checkpoint
    return CheckpointFunction.apply(function, preserve, *args)
  File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/autograd/function.py", line 553, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 261, in forward
    outputs = run_function(*args)
  File "/mnt/900/builds/sd-scripts/library/original_unet.py", line 1014, in custom_forward
    return module(*inputs)
  File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/900/builds/sd-scripts/library/original_unet.py", line 481, in forward
    output_tensor = input_tensor + hidden_states
RuntimeError: The size of tensor a (72) must match the size of tensor b (70) at non-singleton dimension 3

@rockerBOO rockerBOO changed the title LoHa with conv errors with tensor size mismatch LoHa/LoKr with conv errors with tensor size mismatch May 15, 2024
@rockerBOO rockerBOO changed the title LoHa/LoKr with conv errors with tensor size mismatch LoHa/LoKr with conv. Errors with tensor size mismatch May 15, 2024
@rockerBOO rockerBOO changed the title LoHa/LoKr with conv. Errors with tensor size mismatch LoHa/LoKr with conv. Error with tensor size mismatch May 15, 2024
@KohakuBlueleaf
Copy link
Owner

@rockerBOO Does this problem still exist in latest dev?
I totally reconstruct whole library recently

@rockerBOO
Copy link
Contributor Author

rockerBOO commented May 31, 2024

I realized after that you were reconstructing. I tested it with commit 7880753 and kohya dev commit kohya-ss/sd-scripts@0d96e10 . I got the following:

Traceback (most recent call last):
  File "/mnt/900/builds/sd-scripts/train_network.py", line 1143, in <module>
    trainer.train(args)
  File "/mnt/900/builds/sd-scripts/train_network.py", line 887, in train
    noise_pred = self.call_unet(
  File "/mnt/900/builds/sd-scripts/train_network.py", line 126, in call_unet
    noise_pred = unet(noisy_latents, timesteps, text_conds).sample
  File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/accelerate/utils/operations.py", line 822, in forward
    return model_forward(*args, **kwargs)
  File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/accelerate/utils/operations.py", line 810, in __call__
    return convert_to_fp32(self.model_forward(*args, **kwargs))
  File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast
    return func(*args, **kwargs)
  File "/mnt/900/builds/sd-scripts/library/original_unet.py", line 1589, in forward
    sample, res_samples = downsample_block(
  File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/900/builds/sd-scripts/library/original_unet.py", line 1018, in forward
    hidden_states = torch.utils.checkpoint.checkpoint(create_custom_forward(resnet), hidden_states, temb)
  File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/_compile.py", line 24, in inner
    return torch._dynamo.disable(fn, recursive)(*args, **kwargs)
  File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 451, in _fn
    return fn(*args, **kwargs)
  File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/_dynamo/external_utils.py", line 36, in inner
    return fn(*args, **kwargs)
  File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 487, in checkpoint
    return CheckpointFunction.apply(function, preserve, *args)
  File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/autograd/function.py", line 598, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 262, in forward
    outputs = run_function(*args)
  File "/mnt/900/builds/sd-scripts/library/original_unet.py", line 1014, in custom_forward
    return module(*inputs)
  File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/900/builds/sd-scripts/library/original_unet.py", line 481, in forward
    output_tensor = input_tensor + hidden_states
RuntimeError: The size of tensor a (128) must match the size of tensor b (126) at non-singleton dimension 3
network_module = "lycoris.kohya"
network_args = [ 
  "algo=loha", 
  "preset=unet-convblock-only", 
  "dora_wd=true",
  "rs_lora=true",
  "dropout=0.3",
  "rank_dropout=0.15",
  "module_dropout=0.15",
]
pip list | grep lycoris
lycoris-lora              3.0.0.dev6   /mnt/900/builds/sd-scripts/LyCORIS

I can give a full config if you can't recreate. Thanks

@KohakuBlueleaf
Copy link
Owner

This is something bugs in kohya side
The dim3 here is width

It could be some problems about bucket resolution steps

@rockerBOO
Copy link
Contributor Author

Trying to figure out what could be causing it generally as it only happens on the dev version of this repo, and only for LoHa/LoKr with convolution. When using Kohya with conv it works but not sure how to further isolate where it could be causing this to happen differently in dev? I can poke to figure out the cause but any place to look for isolation?

Maybe I could list out the dimensions of my dataset files after processing? Maybe that would help indicate if it's abucket related issue? Or maybe I could make a dataset of just non-bucketed to compare. I will try to address some of these in a few days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants