Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not able to run the pre-trained model #20

Open
Rohith-coder1 opened this issue Dec 16, 2022 · 34 comments
Open

Not able to run the pre-trained model #20

Rohith-coder1 opened this issue Dec 16, 2022 · 34 comments

Comments

@Rohith-coder1
Copy link

Rohith-coder1 commented Dec 16, 2022

Hi sir, I cloned the repo and installed all dependencies but when I am trying to run, it throws error saying unrecognoized arguments

@Rohith-coder1
Copy link
Author

Screenshot 2022-12-16 083437

@UttaranB127
Copy link
Owner

Please note that the command line takes named arguments, so you have to write in the format of python main_v2.py --<arg name> <arg val>

@Rohith-coder1
Copy link
Author

yeah I have given it in that format

@Rohith-coder1
Copy link
Author

parameter

@UttaranB127
Copy link
Owner

I re-checked the code now and it runs correctly on my machine. I made a slight change in the path for parse_args.py, maybe try pulling it and it could help with your issue? Also:

  1. Have you tried running with the default arguments (i.e., not providing any explicit command line arguments)?
  2. Can you share your command in a text format so I can try to test it if needed?

@Rohith-coder1
Copy link
Author

Command:
python main_v2.py --dataset-s2ag ted_db --dataset-test ted_db -c True --frame-drop 2 --train-s2ag False --use-multiple-gpus T --s2ag-load-last-best True --batch-size 512 --num-worker 4 --s2ag-start-epoch 290 --s2ag-num-epoch 500 --base-tr 1 --step 0.5 --lr-s2ag-decay 0.999 --gradient-clip 0.1 --nesterov True --momentum 0.9 --weight-decay 9.591 --upper-body-weight 1 --affs-reg 0.8 --quat-norm-reg 0.1 --quat-reg 1.2 --recons-reg 1.2 --val-interval 1 --log-interval 200 --save-interval 10 --no-cuda False --pavi-log False --print-log True --save-log True

@UttaranB127
Copy link
Owner

Thanks for sharing, I'll take a look when I get some time. Meanwhile, I have checked that the code works without any of the arguments except for --config (where you have to provide the path to a .yml file), so let me know if that also works for you.

@UttaranB127
Copy link
Owner

Looking into your command-line call, I noticed a few errors:

  1. The "-c"/"--config" argument takes in the path to a .yml file, not a boolean.
  2. The arguments "--no-cuda", "--pave-log", "--print-log", and "--save-log" do not take any arguments, you just use those arguments if you need to perform the relevant actions.
  3. The argument "--nesterov" did not take an argument (like the ones in the previous point), but it should. I've fixed the argument parsing so it now takes a boolean argument. Please make sure you pull the latest code to reflect these changes on your end.

Here is a corrected version of your command-line call:

python main_v2.py --dataset-s2ag ted_db --dataset-test ted_db -c config/multimodal_context_v2.yml --frame-drop 2 --train-s2ag False --use-multiple-gpus T --s2ag-load-last-best True --batch-size 512 --num-worker 4 --s2ag-start-epoch 290 --s2ag-num-epoch 500 --base-tr 1 --step 0.5 --lr-s2ag-decay 0.999 --gradient-clip 0.1 --nesterov True --momentum 0.9 --weight-decay 9.591 --upper-body-weight 1 --affs-reg 0.8 --quat-norm-reg 0.1 --quat-reg 1.2 --recons-reg 1.2 --val-interval 1 --log-interval 200 --save-interval 10 --no-cuda --pavi-log --print-log --save-log

@Rohith-coder1
Copy link
Author

Thank you so much. Will execute and update.

@Rohith-coder1
Copy link
Author

I tried running with the updated command, I am trying to run on a Windows Machine with no GPU.
I am again getting some PATH issues. I gave my Project and data path correctly in the mainv2.py. Is there any other place where I should give the path.

@Rohith-coder1
Copy link
Author

Screenshot 2022-12-20 131051

@UttaranB127
Copy link
Owner

Yes, please make sure the paths are correct in both the main python file and the .yml config file.

@Rohith-coder1
Copy link
Author

Screenshot 2022-12-20 131448

@UttaranB127
Copy link
Owner

The paths in loader_v2.py come from the .yml file, so please make sure those are accurate.

@Rohith-coder1
Copy link
Author

I changed all the path, now the code is running but I am getting few errors due to CUDA. I made the no-cuda default to True.
Is that the correct procedure or is there any place I have to change the CUDA specifications

@Rohith-coder1
Copy link
Author

Screenshot 2022-12-23 153252

@UttaranB127
Copy link
Owner

Try without the multiple GPU flag, so just using a single GPU. The parallelization code may have some issues due to pytorch versioning, which will require separate debugging.

@Rohith-coder1
Copy link
Author

Okay, but I am not having GPU in my system. So the code won't work for systems with NO - GPU?

@UttaranB127
Copy link
Owner

If you check line 93 in processor_v2.py (previous commit), the code automatically switches to CPU if no GPU is available. I have made this more explicit in the code so it pre-emptively follows the --no-cuda argument when the argument is present and made a new commit. You can pull the latest changes or just copy lines 93 to 105 in processor_v2.py.

@Rohith-coder1
Copy link
Author

Still getting the same error

@Rohith-coder1
Copy link
Author

I tried running the code in a system with GPU, everything work fine, but I am getting a error in Caching test data 0/26245

@Rohith-coder1
Copy link
Author

Screenshot 2022-12-29 181513

@UttaranB127
Copy link
Owner

Check the file path. The error simply says that the file path is incorrect. If you don't have the preprocessed dataset, do not set the -dap flag.

@UttaranB127
Copy link
Owner

Can you report the error stack trace when running on CPU? I am not able to replicate the error on my machine.

@Rohith-coder1
Copy link
Author

Rohith-coder1 commented Jan 3, 2023

Hii,
I fixed the previous error, now I am getting the model not found error, when I download the .pth file from the link which you have given, it comes in .pth.tar file name, but when I extract the file I am not getting a .pth file, it is just a normal folder with file name archive, data.pkl and version.

Is this the correct way of extracting a .pth.tar file? I even tried tried keep the .pth.tar model file in models folder and gave the path in the code, but still its showing same model not found eror

@Rohith-coder1
Copy link
Author

Can you report the error stack trace when running on CPU? I am not able to replicate the error on my machine.

Yeah will re-try once and will update you.

@UttaranB127
Copy link
Owner

UttaranB127 commented Jan 4, 2023

Hii,
I fixed the previous error, now I am getting the model not found error, when I download the .pth file from the link which you have given, it comes in .pth.tar file name, but when I extract the file I am not getting a .pth file, it is just a normal folder with file name archive, data.pkl and version.

Is this the correct way of extracting a .pth.tar file? I even tried tried keep the .pth.tar model file in models folder and gave the path in the code, but still its showing same model not found eror

  • Keep the .pth.tar file as is, no need to extract anything.
  • In the directory where you're keeping the .pth.tar file, has the code created a log file (it should create a log file automatically)? If it has no log files, as a quick fix, create an empty log.txt file and keep it there. Then the .pth.tar file should load correctly. Essentially, the code validates the model directory by looking for the presence of the log file.

@Rohith-coder1
Copy link
Author

I tried creating log.txt file but still the model not found error persists

@UttaranB127
Copy link
Owner

Can you try debugging the code on your machine to make sure the model path is being read correctly? Can you check which return call of the method get_epoch_and_loss in processor_v2.py (line 53) is getting activated? If you cannot determine any apparent cause for why the model loading should fail, could you please report the full stack trace of the error?

@Rohith-coder1
Copy link
Author

There is a error in caching the test data, the folder is created but the 000000.npz file is not generated.

@Rohith-coder1
Copy link
Author

image

@UttaranB127
Copy link
Owner

Could you please copy-paste the command-line code and the text of the stack trace instead of pasting the screenshot? The text helps me in copy-pasting and save a lot of time when running searches or trying to reproduce the errors.

@Rohith-coder1
Copy link
Author

Command line code :
python main_v2.py --dataset-s2ag ted_db --dataset-test ted_db -c config/multimodal_context_v2.yml --frame-drop 2 --train-s2ag False --use-multiple-gpus T --s2ag-load-last-best True --batch-size 512 --num-worker 4 --s2ag-start-epoch 290 --s2ag-num-epoch 500 --base-tr 1 --step 0.5 --lr-s2ag-decay 0.999 --gradient-clip 0.1 --nesterov True --momentum 0.9 --weight-decay 9.591 --upper-body-weight 1 --affs-reg 0.8 --quat-norm-reg 0.1 --quat-reg 1.2 --recons-reg 1.2 --val-interval 1 --log-interval 200 --save-interval 10 --no-cuda --pavi-log --print-log --save-log

Reading data 'data\ted_db\lmdb_test_s2ag_v2_cache_mfcc_14'...
Found the cache data\ted_db\lmdb_test_s2ag_v2_cache_mfcc_14_s2ag_v2_cache_mfcc_14
building a language model...
loaded from data\ted_db\vocab_models_s2ag\vocab_cache.pkl
Total s2ag testing data: 26245 (100.00%)
Caching test data 0/26245.Traceback (most recent call last):
File "main_v2.py", line 128, in
pr = processor.Processor(base_path, args, s2ag_config_args, data_loader, pose_dim, coords, audio_sr)
File "C:\Users\sandy1902\Speech2Gestures\speech2affective_gestures\processor_v2.py", line 209, in init
self.save_cache('test', test_dir_name)
File "C:\Users\sandy1902\Speech2Gestures\speech2affective_gestures\processor_v2.py", line 328, in save_cache
vid_indices=vid_indices_all[k])
File "<array_function internals>", line 6, in savez_compressed
File "C:\Users\sandy1902\anaconda3\envs\S2D\lib\site-packages\numpy\lib\npyio.py", line 687, in savez_compressed
_savez(file, args, kwds, True)
File "C:\Users\sandy1902\anaconda3\envs\S2D\lib\site-packages\numpy\lib\npyio.py", line 713, in _savez
zipf = zipfile_factory(file, mode="w", compression=compression)
File "C:\Users\sandy1902\anaconda3\envs\S2D\lib\site-packages\numpy\lib\npyio.py", line 112, in zipfile_factory
return zipfile.ZipFile(file, *args, **kwargs)
File "C:\Users\sandy1902\anaconda3\envs\S2D\lib\zipfile.py", line 1240, in init
self.fp = io.open(file, filemode)
FileNotFoundError: [Errno 2] No such file or directory: 'Speech2Gestures/speech2affective_gestures\data/ted_db\ted_db\npz\test\test\000000.npz'

@UttaranB127
Copy link
Owner

I've fixed the pathing issue. Could you try one more time with the new code?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants