Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

data_loader self.num_total_samples = 0 #5

Closed
zf223669 opened this issue Feb 9, 2022 · 19 comments
Closed

data_loader self.num_total_samples = 0 #5

zf223669 opened this issue Feb 9, 2022 · 19 comments

Comments

@zf223669
Copy link

zf223669 commented Feb 9, 2022

Hi,
In processor_v2.py, I found that the self.num_total_samples = 0, I try to debug and show that the variation n_samples in self.data_loader['train_data_s2ag']/[eval_data_s2ag]/[test_data_s2ag] is zero. what`s that problem?
Thanks!

@zf223669
Copy link
Author

zf223669 commented Feb 9, 2022

Second, I intend to train the model, however, it showed that :
FileNotFoundError: [Errno 2] No such file or directory: 'outputs/trimodal_gen.pth.tar

@UttaranB127
Copy link
Owner

Is your data loaded correctly into the code or are there any warnings there? Also, does your "outputs" folder exist? Otherwise, the network may not be able to create the "trimodal_gen.pth.tar" folder.

@UttaranB127
Copy link
Owner

UttaranB127 commented Feb 11, 2022

Following up on the "trimodal_gen.pth.tar" not found error, we do have the trained weights available. However, we do not own the code for generating these weights and the results should be verified with the original source.

@Amir3022
Copy link

Greetings, I still have this same problem of self.num_total_samples=0, as well as train, eval and test samples are all equal to zero. The on;y waring I get when trying to run the model so far is "Warning : load_model does not return WordVectorModel or SupervisedModel any more, but a FastText object which is very similar." Which as far as I know, is a deprecation warning and shouldn't cause any problems.
I have the ted_db data and the fasttext 'crawl_300d_2M_subword.bin' in the right location. Although the fasttext bin file and the 'NRC_VAD_Lexicon' I donwloaded them from sources no provided in this repo.
So any help with this issue?

@UttaranB127
Copy link
Owner

Are the folders lmdb_train_s2ag_v2_cache_mfcc_14, lmdb_val_s2ag_v2_cache_mfcc_14, and lmdb_test_s2ag_v2_cache_mfcc_14 generated? Each of these folders should contain two files, data.mdb and lock.mdb. The size of each lock.mdb is 8 KB and the data.mdb files have sizes of a few GBs (the one for 'train' is a few hundred GBs). If these files are present but the sizes do not match, please delete them and rerun the code to re-generate them.

@zf223669
Copy link
Author

Hello: i did followed your suggest,however, it got an other error:
/home/zf223669/Mount/anaconda3/envs/s2ag/bin/python3.7 /home/zf223669/Mount/s2ag/s2ag/main_v2.py -c /home/zf223669/Mount/s2ag/s2ag/config/multimodal_context_v2.yml
../data/NRC-VAD-Lexicon-Aug2018Release/NRC-VAD-Lexicon.txt
../data
Reading data '../data/ted_db/lmdb_train'...
Found the cache ../data/ted_db/lmdb_train_s2ag_v2_cache_mfcc_14
Reading data '../data/ted_db/lmdb_val'...
Found the cache ../data/ted_db/lmdb_val_s2ag_v2_cache_mfcc_14
Reading data '../data/ted_db/lmdb_test'...
Found the cache ../data/ted_db/lmdb_test_s2ag_v2_cache_mfcc_14
building a language model...
loaded from ../data/ted_db/vocab_models_s2ag/vocab_cache.pkl
++++++++++++++0 +++0 +++0
Training s2ag with batch size: 512
Loading train cache took 1 seconds.
Loading eval cache took 1 seconds.
Traceback (most recent call last):
File "/home/zf223669/Mount/s2ag/s2ag/main_v2.py", line 132, in
pr.train()
File "/home/zf223669/Mount/s2ag/s2ag/processor_v2.py", line 979, in train
self.trimodal_generator.load_state_dict(trimodal_checkpoint['trimodal_gen_dict'])
File "/home/zf223669/Mount/anaconda3/envs/s2ag/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1483, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for PoseGeneratorTriModal:
size mismatch for text_encoder.embedding.weight: copying a param with shape torch.Size([29460, 300]) from checkpoint, the shape in current model is torch.Size([26619, 300]).

@Amir3022
Copy link

Are the folders lmdb_train_s2ag_v2_cache_mfcc_14, lmdb_val_s2ag_v2_cache_mfcc_14, and lmdb_test_s2ag_v2_cache_mfcc_14 generated? Each of these folders should contain two files, data.mdb and lock.mdb. The size of each lock.mdb is 8 KB and the data.mdb files have sizes of a few GBs (the one for 'train' is a few hundred GBs). If these files are present but the sizes do not match, please delete them and rerun the code to re-generate them.

Those files are generated at the "ted_db" folder, but each has a size of around 16kb (the whole folder with including data and lock.mdb). Granted, I only have 50GB of free space in the hard drive, although at the first run it was complaining as I had only 40GB so I increased that to 50GB and the code ran properly after that till I ran into this problem. So what is the needed freespace for the model to run properly without problem?

I was hoping to use the pre-trained model as I didn't need to retrain the model and would like to use the inference directly for output. However, there is no command line for --train in the arguments of main_v2.py. There is a similar one, which is --train-s2ag, but setting that to False, still requires the data to be present, and I end up with the same error as the one described before. So is it possible to use the model without retraining it, using the provided pre-trained model?

@UttaranB127
Copy link
Owner

Are the folders lmdb_train_s2ag_v2_cache_mfcc_14, lmdb_val_s2ag_v2_cache_mfcc_14, and lmdb_test_s2ag_v2_cache_mfcc_14 generated? Each of these folders should contain two files, data.mdb and lock.mdb. The size of each lock.mdb is 8 KB and the data.mdb files have sizes of a few GBs (the one for 'train' is a few hundred GBs). If these files are present but the sizes do not match, please delete them and rerun the code to re-generate them.

Those files are generated at the "ted_db" folder, but each has a size of around 16kb (the whole folder with including data and lock.mdb). Granted, I only have 50GB of free space in the hard drive, although at the first run it was complaining as I had only 40GB so I increased that to 50GB and the code ran properly after that till I ran into this problem. So what is the needed freespace for the model to run properly without problem?

I was hoping to use the pre-trained model as I didn't need to retrain the model and would like to use the inference directly for output. However, there is no command line for --train in the arguments of main_v2.py. There is a similar one, which is --train-s2ag, but setting that to False, still requires the data to be present, and I end up with the same error as the one described before. So is it possible to use the model without retraining it, using the provided pre-trained model?

I have debugged some cmd line argument issues to make sure the code does not require the full training data to be loaded if you just want to test the network. However, you still need to have the lmdb_train, lmdb_val and lmdb_test folders with the data.mdb and lock.mdb files as they contain relevant metadata. You do not need the additional cache or npz folders.

@UttaranB127
Copy link
Owner

Hello: i did followed your suggest,however, it got an other error: /home/zf223669/Mount/anaconda3/envs/s2ag/bin/python3.7 /home/zf223669/Mount/s2ag/s2ag/main_v2.py -c /home/zf223669/Mount/s2ag/s2ag/config/multimodal_context_v2.yml ../data/NRC-VAD-Lexicon-Aug2018Release/NRC-VAD-Lexicon.txt ../data Reading data '../data/ted_db/lmdb_train'... Found the cache ../data/ted_db/lmdb_train_s2ag_v2_cache_mfcc_14 Reading data '../data/ted_db/lmdb_val'... Found the cache ../data/ted_db/lmdb_val_s2ag_v2_cache_mfcc_14 Reading data '../data/ted_db/lmdb_test'... Found the cache ../data/ted_db/lmdb_test_s2ag_v2_cache_mfcc_14 building a language model... loaded from ../data/ted_db/vocab_models_s2ag/vocab_cache.pkl ++++++++++++++0 +++0 +++0 Training s2ag with batch size: 512 Loading train cache took 1 seconds. Loading eval cache took 1 seconds. Traceback (most recent call last): File "/home/zf223669/Mount/s2ag/s2ag/main_v2.py", line 132, in pr.train() File "/home/zf223669/Mount/s2ag/s2ag/processor_v2.py", line 979, in train self.trimodal_generator.load_state_dict(trimodal_checkpoint['trimodal_gen_dict']) File "/home/zf223669/Mount/anaconda3/envs/s2ag/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1483, in load_state_dict self.class.name, "\n\t".join(error_msgs))) RuntimeError: Error(s) in loading state_dict for PoseGeneratorTriModal: size mismatch for text_encoder.embedding.weight: copying a param with shape torch.Size([29460, 300]) from checkpoint, the shape in current model is torch.Size([26619, 300]).

I am actually unable to replicate this error as my trimodal_generator matches all keys successfully. I have re-uploaded the file trimodal_gen.pth.tar from a different local address. Maybe try downloading it again?

@Amir3022
Copy link

Are the folders lmdb_train_s2ag_v2_cache_mfcc_14, lmdb_val_s2ag_v2_cache_mfcc_14, and lmdb_test_s2ag_v2_cache_mfcc_14 generated? Each of these folders should contain two files, data.mdb and lock.mdb. The size of each lock.mdb is 8 KB and the data.mdb files have sizes of a few GBs (the one for 'train' is a few hundred GBs). If these files are present but the sizes do not match, please delete them and rerun the code to re-generate them.

Those files are generated at the "ted_db" folder, but each has a size of around 16kb (the whole folder with including data and lock.mdb). Granted, I only have 50GB of free space in the hard drive, although at the first run it was complaining as I had only 40GB so I increased that to 50GB and the code ran properly after that till I ran into this problem. So what is the needed freespace for the model to run properly without problem?
I was hoping to use the pre-trained model as I didn't need to retrain the model and would like to use the inference directly for output. However, there is no command line for --train in the arguments of main_v2.py. There is a similar one, which is --train-s2ag, but setting that to False, still requires the data to be present, and I end up with the same error as the one described before. So is it possible to use the model without retraining it, using the provided pre-trained model?

I have debugged some cmd line argument issues to make sure the code does not require the full training data to be loaded if you just want to test the network. However, you still need to have the lmdb_train, lmdb_val and lmdb_test folders with the data.mdb and lock.mdb files as they contain relevant metadata. You do not need the additional cache or npz folders.

Okay I will try the model again tomorrow, and see if I can get it to run the inference without having to retrain it. Will keep you updated with my results, and if there are still any issues.

@zf223669
Copy link
Author

May I suggest that you describe the installation, configuration and running process in more detail, such as which directory is the downloaded dataset placed, how to configure the running parameters, etc.? Thank you!

@UttaranB127
Copy link
Owner

The running parameters are straightforward and available easily from the command-line argument descriptors as well as the main paper. The readme already contains details on where each downloaded should be placed, I will add in more details based on the issues that are being raised and resolved. Meanwhile, let me know if your current issue is resolved.

@zf223669
Copy link
Author

hi, I have downloaded the "The Trinity Gesture dataset" , however, it contained many files, which one should I load?

@zf223669
Copy link
Author

Hello, I recloned all the project again, modified the basepath. created data folder, and put the
fasttext,
GENEA_Challenge_2000_data_release,
NRC_VAD_Lexicon_Aug2018Release,
ted_db datasets in it and get ready to run the main_v2.py
I typed the instruct: python3 main_v2.py -c /home/zf223669/Mount/speech2affective_gestures/config/multimodal_context_v2.yml and run it.
However , it showed an error about the mfcc, showed below:
/home/zf223669/Mount/anaconda3/envs/s2ag/bin/python3.7 /home/zf223669/Mount/speech2affective_gestures/main_v2.py -c /home/zf223669/Mount/speech2affective_gestures/config/multimodal_context_v2.yml
Reading data '/home/zf223669/Mount/speech2affective_gestures/data/ted_db/lmdb_train'...
Creating the dataset cache...
Traceback (most recent call last):
File "/home/zf223669/Mount/speech2affective_gestures/main_v2.py", line 121, in
train_data_ted, val_data_ted, test_data_ted = loader.load_ted_db_data(data_path, s2ag_config_args, args.train_s2ag)
File "/home/zf223669/Mount/speech2affective_gestures/loader_v2.py", line 580, in load_ted_db_data
remove_word_timing=(config_args.input_context == 'text')
File "/home/zf223669/Mount/speech2affective_gestures/loader_v2.py", line 482, in init
data_sampler.run()
File "/home/zf223669/Mount/speech2affective_gestures/utils/data_preprocessor.py", line 56, in run
filtered_result = self._sample_from_clip(vid, clip)
File "/home/zf223669/Mount/speech2affective_gestures/utils/data_preprocessor.py", line 140, in _sample_from_clip
sample_mfcc_combined = get_mfcc_features(sample_audio, sr=16000, num_mfcc=self.num_mfcc)
File "/home/zf223669/Mount/speech2affective_gestures/utils/common.py", line 342, in get_mfcc_features
mfcc_features = mfcc(audio, sr=sr, n_mfcc=num_mfcc) / 1000.
TypeError: mfcc() takes 0 positional arguments but 1 positional argument (and 2 keyword-only arguments) were given

What`s that problem?
:(

@zf223669
Copy link
Author

I have try many times, however, it showed some weird problems. May I recommend that you could clone your project in another place and try to fix the bug?
I now get stuck at the lang_model = pickle.load(f) show below:
/home/zf223669/Mount/anaconda3/envs/s2ag/bin/python3.7 /home/zf223669/Mount/speech2affective_gestures/main_v2.py -c /home/zf223669/Mount/speech2affective_gestures/config/multimodal_context_v2.yml
Reading data '/home/zf223669/Mount/speech2affective_gestures/data/ted_db/lmdb_train'...
Found the cache /home/zf223669/Mount/speech2affective_gestures/data/ted_db/lmdb_train_s2ag_v2_cache_mfcc_14
Reading data '/home/zf223669/Mount/speech2affective_gestures/data/ted_db/lmdb_val'...
Found the cache /home/zf223669/Mount/speech2affective_gestures/data/ted_db/lmdb_val_s2ag_v2_cache_mfcc_14
Reading data '/home/zf223669/Mount/speech2affective_gestures/data/ted_db/lmdb_test'...
Found the cache /home/zf223669/Mount/speech2affective_gestures/data/ted_db/lmdb_test_s2ag_v2_cache_mfcc_14
building a language model...
loaded from /home/zf223669/Mount/speech2affective_gestures/data/ted_db/vocab_models_s2ag/vocab_cache.pkl
Traceback (most recent call last):
File "/home/zf223669/Mount/speech2affective_gestures/main_v2.py", line 121, in
train_data_ted, val_data_ted, test_data_ted = loader.load_ted_db_data(data_path, s2ag_config_args, args.train_s2ag)
File "/home/zf223669/Mount/speech2affective_gestures/loader_v2.py", line 611, in load_ted_db_data
config_args.wordembed_dim)
File "/home/zf223669/Mount/speech2affective_gestures/utils/vocab_utils.py", line 29, in build_vocab
lang_model = pickle.load(f)
ModuleNotFoundError: No module named 'model'

@UttaranB127
Copy link
Owner

Let me look into these errors for you. These errors seem a bit unusual and I don't recall coming across them myself. Might be a case of version mismatches or some missing files, but I will try to replicate your errors and update the repo and readme accordingly. It might take some time though.

@zf223669
Copy link
Author

Thank you!! :)

@UttaranB127
Copy link
Owner

Hi, it turns out that some of the packages were deprecated without compatibility since we released our code. As a result, preprocessing the data and later running training/inference needs you to have different versions of basic packages such as numpy installed. This requires strict versioning and modulation of our codebase that are beyond our scope at the moment. To circumnavigate the issue, I am uploading the preprocessed dataset in a single folder that you can download and directly use for training/inference. Please keep the entire contents of the download in the folder data/ted_db and point the variable data_path in main_v2.py to the data folder. I have also revised the code to make the data loading faster. Let me know if you still face issues.

@UttaranB127
Copy link
Owner

Also, I am closing this issue at it seems more general than the original title suggests. Please pose any follow-up in issue #7.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants