Reproduce the model #4

yikehuaili · 2025-02-24T02:03:03Z

Hello, thank you for your great contribution.
I am a novice in AI modeling. When reproducing the model, I encountered the following error message. Can I ask how to solve it?
Execute command: GPU=0 bash scripts/run_exp_pipe.sh pepbench_codesign configs/pepbench/autoencoder/train_codesign.yaml configs/pepbench/ldm/train_codesign.yaml configs/pepbench/ldm/setup_latent_guidance.yaml configs/pepbench/test_codesign.yaml
Error message:
100%|██████████| 170/170 [01:25<00:00, 1.98it/s, loss=0, version=0]

0%| | 0/4157 [00:00<?, ?it/s]
100%|██████████| 4157/4157 [00:00<00:00, 1097138.29it/s]
2025-02-21 21:34:14::INFO::validating ...

0%| | 0/6 [00:00<?, ?it/s]
0%| | 0/6 [00:01<?, ?it/s]
Traceback (most recent call last):
File "/home/dengxj/yanlin/deeplearning/PepGLAD/train.py", line 78, in
main(args, opt_args)
File "/home/dengxj/yanlin/deeplearning/PepGLAD/train.py", line 71, in main
trainer.train(args.gpus, args.local_rank)
File "/home/dengxj/yanlin/deeplearning/PepGLAD/trainer/abs_trainer.py", line 253, in train
self._valid_epoch(device)
File "/home/dengxj/yanlin/deeplearning/PepGLAD/trainer/abs_trainer.py", line 155, in _valid_epoch
metric = self.valid_step(batch, self.valid_global_step)
File "/home/dengxj/yanlin/deeplearning/PepGLAD/trainer/ldm_trainer.py", line 52, in valid_step
loss, loss_dict = self.model(**batch)
ValueError: not enough values to unpack (expected 2, got 1)
cat: ./exps/pepbench_fixseq/LDM/version_0/checkpoint/topk_map.txt: 没有那个文件或目录
usage: setup_latent_guidance.py [-h] --config CONFIG --ckpt CKPT [--gpu GPU]
setup_latent_guidance.py: error: argument --ckpt: expected one argument
usage: generate.py [-h] --config CONFIG --ckpt CKPT [--save_dir SAVE_DIR]
[--gpu GPU] [--n_cpu N_CPU]
generate.py: error: argument --ckpt: expected one argument
Traceback (most recent call last):
File "/home/dengxj/yanlin/deeplearning/PepGLAD/cal_metrics.py", line 228, in
main(parse())
File "/home/dengxj/yanlin/deeplearning/PepGLAD/cal_metrics.py", line 153, in main
with open(args.results, 'r') as fin:
FileNotFoundError: [Errno 2] No such file or directory: './exps/pepbench_fixseq/LDM/version_0/results/results.jsonl'

kxz18 · 2025-02-24T02:30:45Z

Hi, could you share the complete logs? As I found there are some mismatches in the currently provided logs. The executed command specifies training with codesign mode, but later logs seems to find checkpoints in folders with "fixseq" as suffix. Also, information like "Training Autoencoder with config xxx" or "Using Autoencoder checkpoint" didn't appear, which makes me lost in tracking the stages.

yikehuaili · 2025-02-24T02:42:42Z

Hi, could you share the complete logs? As I found there are some mismatches in the currently provided logs. The executed command specifies training with codesign mode, but later logs seems to find checkpoints in folders with "fixseq" as suffix. Also, information like "Training Autoencoder with config xxx" or "Using Autoencoder checkpoint" didn't appear, which makes me lost in tracking the stages.

OK, Thank you so much!

nohup.txt

kxz18 · 2025-02-24T04:30:43Z

Hi, looks like you need to either use a GPU with larger memory (at least 24G), or reduce the dynamic batch size here.
The logs keep throwing warning of cuda out of memory every step, which may also be the case during validation. The forward function is wrapped with oom decoration, which will return an OOM signal if running out of cuda memory. This is not expected during validation thus not handled by the codes for validation loops, which is the reason of the final error "ValueError: not enough values to unpack (expected 2, got 1)".

yikehuaili · 2025-02-24T08:41:54Z

Hi, looks like you need to either use a GPU with larger memory (at least 24G), or reduce the dynamic batch size here. The logs keep throwing warning of cuda out of memory every step, which may also be the case during validation. The forward function is wrapped with oom decoration, which will return an OOM signal if running out of cuda memory. This is not expected during validation thus not handled by the codes for validation loops, which is the reason of the final error "ValueError: not enough values to unpack (expected 2, got 1)".

Thank you very much for your answer!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reproduce the model #4

Reproduce the model #4

yikehuaili commented Feb 24, 2025

kxz18 commented Feb 24, 2025

yikehuaili commented Feb 24, 2025

kxz18 commented Feb 24, 2025

yikehuaili commented Feb 24, 2025

Reproduce the model #4

Reproduce the model #4

Comments

yikehuaili commented Feb 24, 2025

kxz18 commented Feb 24, 2025

yikehuaili commented Feb 24, 2025

kxz18 commented Feb 24, 2025

yikehuaili commented Feb 24, 2025