Questions about the sampled data. #17

Meteor-Stars · 2020-11-10T07:38:22Z

Hello, I hope I'm not disturbing you. My questions are as follows.

First, I notice that when the global steps reach the number of 'plot_gap', the model will sample some '.bvh' files. How can I find the corresponding audio files？ I supposed that the corresponding audio files should be in the 'visualization_dev' or 'visualization_test' folder, but the quantity of the audio files in these folders is both different with the quantity of output '.bvh' files. How can I find the corresponding audio clips of the output 'bvh' files?

Seconde, I find that, in 'trinity.py', the 'test_input_20fps.npz' and the 'test_output_20fps.npz' files which were processed in 'prepare_datasets.py', were not used. It seems that I haven't found the files were used in other places. The files are must be useful in the overlooked place. Can you give me some guidances to help me resolve this confusion?

I would be grateful if you could give me some help. I am looking forward to hearing from you!

Best wishes for you!

ghenter · 2020-11-15T11:11:20Z

I think only @simonalexanderson knows how to answer these questions, and I hope he can find the time to help.

Meteor-Stars · 2020-11-15T11:49:58Z

Thank you. And I might add that, just like the synthesized gesture of Obama you provided. The Obama audio file was divided into many clips, and the model sampled many output '.bvh' files. You must first find the corresponding audio clip of the output 'bvh' file and then the audio clip would be synchronized with the gesture. Finally the synthesized gesture video of Obama, which had the audio clip and the corresponding gesture motion at the same time, was present. And now I am not sure how to find the corresponding audio clips of the output 'bvh' files and looking forward to some advice and guidances.

simonalexanderson · 2020-12-14T12:36:31Z

Hi @Meteor-Stars,
I have now restructured the code and added a script called 'prepare_gesture_testdata.py' to facilitate sythesis from arbitrary wav-sources. The process is:

resample wav files to 48k, and place them in the data/GENEA/source/test_audio folder
run 'python prepare_gesture_testdata.py'.
modify hparams/<some_params>.json to point at the data/GENEA/processed/test file and add the pretrained model.
run python train_moglow.py hparams/<some_params>.json trinity.

Hope this helps.

stefan-kohnen mentioned this issue Mar 26, 2021

Duration mismatch between input audio speech file and synthesized gesture #27

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions about the sampled data. #17

Questions about the sampled data. #17

Meteor-Stars commented Nov 10, 2020

ghenter commented Nov 15, 2020

Meteor-Stars commented Nov 15, 2020

simonalexanderson commented Dec 14, 2020

Questions about the sampled data. #17

Questions about the sampled data. #17

Comments

Meteor-Stars commented Nov 10, 2020

ghenter commented Nov 15, 2020

Meteor-Stars commented Nov 15, 2020

simonalexanderson commented Dec 14, 2020