Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

After just using VAE reconstruct a audio, I only get noise #19

Open
SuperiorDtj opened this issue May 31, 2023 · 5 comments
Open

After just using VAE reconstruct a audio, I only get noise #19

SuperiorDtj opened this issue May 31, 2023 · 5 comments

Comments

@SuperiorDtj
Copy link

SuperiorDtj commented May 31, 2023

Here is my code. Is there something wrong on my method about using vae?

`def recon_vae(self, filename):
        """ recon audio only by vae """
        with torch.no_grad():
        waveform, sample_rate = torchaudio.load(filename)
        waveform = torchaudio.functional.resample(waveform, orig_freq=sample_rate, new_freq=16000)[0]
        waveform = waveform - torch.mean(waveform)
        waveform = waveform / (torch.max(torch.abs(waveform)) + 1e-8)
        waveform = 0.5 * waveform
        waveform = waveform / torch.max(torch.abs(waveform))
        waveform = 0.5 * waveform
      
        #waveform = 0.5 * waveform[0:int(len(waveform)*1)]
        
        audio = torch.unsqueeze(waveform, 0)
        audio = torch.nan_to_num(torch.clip(audio, -1, 1))
        audio = torch.autograd.Variable(audio, requires_grad=False)
        melspec, log_magnitudes_stft, energy = self.stft.mel_spectrogram(audio)
        melspec = melspec.transpose(1, 2)
        melspec = melspec.unsqueeze(1)
        truth_lattent = self.vae.get_first_stage_encoding(self.vae.encode_first_stage(melspec))
       
        mel_recon = self.vae.decode_first_stage(truth_lattent)
        wave = self.vae.decode_to_waveform(mel_recon)
    return wave[0], waveform`
@SuperiorDtj SuperiorDtj changed the title After just using VAE reconstruct a audio, I only get noise? After just using VAE reconstruct a audio, I only get noise May 31, 2023
@deepanwayx
Copy link
Collaborator

Can you try the folllowing:

import torch
import torchaudio
from tango import Tango
from tools.torch_tools import wav_to_fbank

filename = ... 

device = "cuda:0"
tango = Tango("declare-lab/tango", device)
tango.vae.eval()
tango.stft.eval()

duration = 10
target_length = int(duration * 102.4)

with torch.no_grad():
    mel, _, waveform = wav_to_fbank([filename], target_length, tango.stft)
    mel = mel.unsqueeze(1).to(device)
    latent = tango.vae.get_first_stage_encoding(tango.vae.encode_first_stage(mel))
    reconstructed_mel = tango.vae.decode_first_stage(latent)
    reconstructed_waveform = tango.vae.decode_to_waveform(reconstructed_mel)[0]

@SuperiorDtj
Copy link
Author

SuperiorDtj commented Jun 5, 2023

Can you try the folllowing:

import torch
import torchaudio
from tango import Tango
from tools.torch_tools import wav_to_fbank

filename = ... 

device = "cuda:0"
tango = Tango("declare-lab/tango", device)
tango.vae.eval()
tango.stft.eval()

duration = 10
target_length = int(duration * 102.4)

with torch.no_grad():
    mel, _, waveform = wav_to_fbank([filename], target_length, tango.stft)
    mel = mel.unsqueeze(1).to(device)
    latent = tango.vae.get_first_stage_encoding(tango.vae.encode_first_stage(mel))
    reconstructed_mel = tango.vae.decode_first_stage(latent)
    reconstructed_waveform = tango.vae.decode_to_waveform(reconstructed_mel)[0]

Thanks for your code!Now I can reconstruct the audio, but only in the situation that the number of the audio's frames is the multiple of four(3.6s dur instead of 3.7s dur)it can reconstruct the audio.
Is this commom issue of the VAE model?

@deepanwayx
Copy link
Collaborator

What is the exact issue when reconstructing a 3.7s audio? Does it generate noise for the entire 3.7s or the last 0.1s?

@SuperiorDtj
Copy link
Author

What is the exact issue when reconstructing a 3.7s audio? Does it generate noise for the entire 3.7s or the last 0.1s?

When the VAE reconsturct a 3.7s audio, it generate noise for the entire 3.7s

@ikm565
Copy link

ikm565 commented Jul 29, 2023

I meet the same problem as u. Have the problem been solved? I tried making reconstruction on the same one audio smaple for several times, the reconstructed results are always very different noise. And the results of each reconstruction vary greatly from one another.

The only one solution is setting the duration like this?
target_length = int(duration * 102.4)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants