Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

An error when using LoRA for s3prl frontend. #5721

Open
simpleoier opened this issue Mar 27, 2024 · 1 comment
Open

An error when using LoRA for s3prl frontend. #5721

simpleoier opened this issue Mar 27, 2024 · 1 comment
Labels
Bug bug should be fixed

Comments

@simpleoier
Copy link
Collaborator

Describe the bug
When applying LoRA fine-tuning on the s3prl frontend (e.g. hubert_base), the output has no gradients. More specifically, I simply used the last layer, instead of using the multi-layer weighted sum.

Basic environments:

  • OS information: Linux 4.18.0-372.32.1.el8_6.x86_64 Updated sphinx documents #1 SMP Fri Oct 28 15:56:52 EDT 2022 x86_64
  • python version: 3.9.18 (main, Sep 11 2023, 13:41:44) [GCC 11.2.0]
  • espnet version: espnet 202402
  • pytorch version: pytorch 2.1.0
  • Git hash: ec9760b22654dc04eeecd37e2659ebda0325a786
    • Commit date: Sat Mar 2 01:25:58 2024 -0500

Environments from torch.utils.collect_env:
e.g.,

Collecting environment information...
PyTorch version: 2.1.0
Is debug build: False
CUDA used to build PyTorch: 11.8
ROCM used to build PyTorch: N/A

OS: Springdale Open Enterprise Linux 8.6 (Modena) (x86_64)
GCC version: (GCC) 8.5.0 20210514 (Red Hat 8.5.0-10)
Clang version: Could not collect
CMake version: version 3.28.3
Libc version: glibc-2.28

Python version: 3.9.18 (main, Sep 11 2023, 13:41:44)  [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-4.18.0-372.32.1.el8_6.x86_64-x86_64-with-glibc2.28
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              48
On-line CPU(s) list: 0-47
Thread(s) per core:  2
Core(s) per socket:  12
Socket(s):           2
NUMA node(s):        2
Vendor ID:           GenuineIntel
CPU family:          6
Model:               63
Model name:          Intel(R) Xeon(R) CPU E5-2670 v3 @ 2.30GHz
Stepping:            2
CPU MHz:             3100.000
CPU max MHz:         3100.0000
CPU min MHz:         1200.0000
BogoMIPS:            4600.34
Virtualization:      VT-x
L1d cache:           32K
L1i cache:           32K
L2 cache:            256K
L3 cache:            30720K
NUMA node0 CPU(s):   0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46
NUMA node1 CPU(s):   1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm xsaveopt cqm_llc cqm_occup_llc dtherm ida arat pln pts md_clear flush_l1d

Versions of relevant libraries:
[pip3] numpy==1.23.5
[pip3] pytorch-ranger==0.1.1
[pip3] torch==2.1.0
[pip3] torch-complex==0.4.3
[pip3] torch-optimizer==0.3.0
[pip3] torchaudio==2.1.0
[pip3] triton==2.0.0
[conda] blas                      1.0                         mkl  
[conda] mkl                       2023.1.0         h213fc3f_46344  
[conda] mkl-service               2.4.0            py39h5eee18b_1  
[conda] mkl_fft                   1.3.8            py39h5eee18b_0  
[conda] mkl_random                1.2.4            py39hdb19cb5_0  
[conda] numpy                     1.23.5           py39hf6e8229_1  
[conda] numpy-base                1.23.5           py39h060ed82_1  
[conda] pytorch                   2.1.0           py3.9_cuda11.8_cudnn8.7.0_0    pytorch
[conda] pytorch-cuda              11.8                 h7e8668a_5    pytorch
[conda] pytorch-mutex             1.0                        cuda    pytorch
[conda] pytorch-ranger            0.1.1                    pypi_0    pypi
[conda] torch-complex             0.4.3                    pypi_0    pypi
[conda] torch-optimizer           0.3.0                    pypi_0    pypi
[conda] torchaudio                2.1.0                py39_cu118    pytorch
[conda] triton                    2.0.0                    pypi_0    pypi

Task information:

  • Task: ASR
  • Recipe: librispeech_100
  • ESPnet2

To Reproduce
Steps to reproduce the behavior:

  1. move to a recipe directory, e.g., cd egs2/librispeech_100/asr1
  2. run stage 11 with the following config
frontend: s3prl
frontend_conf:
    frontend_conf:
        upstream: hubert_base  # Note: If the upstream is changed, please change the input_size in the preencoder.
    download_dir: ./hub
    multilayer_feature: false

preencoder: linear
preencoder_conf:
    input_size: 768  # Note: If the upstream is changed, please change this value accordingly.
    output_size: 128

encoder: e_branchformer
encoder_conf:
    output_size: 256
    attention_heads: 4
    attention_layer_type: rel_selfattn
    pos_enc_layer_type: rel_pos
    rel_pos_type: latest
    cgmlp_linear_units: 1024
    cgmlp_conv_kernel: 31
    use_linear_after_conv: false
    gate_activation: identity
    num_blocks: 12
    dropout_rate: 0.1
    positional_dropout_rate: 0.1
    attention_dropout_rate: 0.1
    input_layer: conv2d2
    layer_drop_rate: 0.0
    linear_units: 1024
    positionwise_layer_type: linear
    use_ffn: true
    macaron_ffn: true
    merge_conv_kernel: 31

decoder: null

use_adapter: true
adapter: lora
adapter_conf:
    rank: 8
    alpha: 16
    dropout_rate: 0.05
    target_modules: ["q_proj", "k_proj", "v_proj", "out_proj"]

model_conf:
    ctc_weight: 1.0
    lsm_weight: 0.1
    length_normalized_loss: false

seed: 2022
num_workers: 4
batch_type: numel
batch_bins: 1000000
accum_grad: 1
num_iters_per_epoch: 2
max_epoch: 3
patience: none
init: none
best_model_criterion:
-   - valid
    - acc
    - max
keep_nbest_models: 10
use_amp: true

optim: adam
optim_conf:
    lr: 0.002
    weight_decay: 0.000001
scheduler: warmuplr
scheduler_conf:
    warmup_steps: 15000

Error logs

Traceback (most recent call last):
  File "/miniconda/envs/espnet/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/miniconda/envs/espnet/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/espnet/espnet2/bin/asr_train.py", line 23, in <module>
    main()
  File "/espnet/espnet2/bin/asr_train.py", line 19, in main
    ASRTask.main(cmd=cmd)
  File "/espnet/espnet2/tasks/abs_task.py", line 1132, in main
    cls.main_worker(args)
  File "/espnet/espnet2/tasks/abs_task.py", line 1447, in main_worker
    cls.trainer.run(
  File "/espnet/espnet2/train/trainer.py", line 317, in run
    all_steps_are_invalid = cls.train_one_epoch(
  File "/espnet/espnet2/train/trainer.py", line 684, in train_one_epoch
    scaler.scale(loss).backward()
  File "/miniconda/envs/espnet/lib/python3.9/site-packages/torch/_tensor.py", line 492, in backward
    torch.autograd.backward(
  File "/miniconda/envs/espnet/lib/python3.9/site-packages/torch/autograd/__init__.py", line 251, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
@simpleoier simpleoier added the Bug bug should be fixed label Mar 27, 2024
@Stanwang1210
Copy link
Contributor

Sorry for the late reply.
This bug occurs due to the attention implementation in s3prl frontend. Specifically, it uses torch.nn.functional.multi_head_attention_forward., which only uses the q_proj.weight.

So currently, please avoid using lora with s3prl frontend. An alternatives for lora could be the houlsby adapter.
We will send a PR to prevent users from using lora with s3prl frontend soon.
Thank you for reporting this bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug bug should be fixed
Projects
None yet
Development

No branches or pull requests

2 participants