Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: the lengths of the features after FACodecEncoderV2 is not match #188

Open
Mahaotian1 opened this issue Apr 19, 2024 · 1 comment
Open
Assignees
Labels
bug Something isn't working

Comments

@Mahaotian1
Copy link

bug of FACodecEncoderV2

I have extracted prosody_feature and encoder_output from FACodecEncoderV2. It raise wrong when I use fa_decoder_v2 to extract vq codecs becaucse the lengths of prosody_feature(torch.Size([1, 20, 281])) and encoder_output(torch.Size([1, 256, 282])) is not same.

my code

wav_b = librosa.load(wav_b, sr=16000)[0]
wav_b = torch.from_numpy(wav_b).float()
wav_b = wav_b.unsqueeze(0).unsqueeze(0)
enc_out_b = fa_encoder_v2(wav_b)
prosody_b = fa_encoder_v2.get_prosody_feature(wav_b)
vq_post_emb_b, vq_id_b, _, quantized, spk_embs_b = fa_decoder_v2(
enc_out_b, prosody_b, eval_vq=False, vq=True
)

bug

File "/home/data/mahaotian/Amphion/models/codec/ns3_codec/inference_codc.py", line 129, in
vq_post_emb_a, vq_id_a, _, quantized, spk_embs_a = fa_decoder_v2(
File "/home/data/mahaotian/anaconda3/envs/vallex/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/data/mahaotian/anaconda3/envs/vallex/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/data/mahaotian/Amphion/models/codec/ns3_codec/facodec.py", line 1086, in forward
outs, qs, commit_loss, quantized_buf = self.quantize(
File "/home/data/mahaotian/Amphion/models/codec/ns3_codec/facodec.py", line 1048, in quantize
outs += out
RuntimeError: The size of tensor a (281) must match the size of tensor b (282) at non-singleton dimension 2

@HeCheng0625
Copy link
Collaborator

Hi, you need padding your wav length to multiples of 200 (hopsize)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants