Open
Description
When the win_duration
is greater than or equal to the length of audio (audio_signal.signal_duration
), then the model fails to decompress the audio when calling model.decompress()
. This error appears in only some samples - not sure why. Below is some code to reproduce the error:
from audiotools import AudioSignal
import dac
from datasets import load_dataset
model_type = "24khz"
model_path = dac.utils.download(model_type=model_type)
model = dac.DAC.load(model_path)
ds = load_dataset("DynamicSuperb/SourceSeparation_libri2Mix_test", split="test")
ds = ds.with_format("torch")
# find the index of the failing case
fname = "data/test/1089-134686-0008_8230-279154-0012.wav"
for i, d in enumerate(ds):
if d["file"] == fname:
break
# Load audio signal
a = AudioSignal(ds[i]["audio"]["array"], sample_rate=ds[i]["audio"]["sampling_rate"].item())
# Run model
z = model.compress(a, win_duration=5.0)
synth = model.decompress(z) # <--- Error
This gives the following error and traceback:
Traceback (most recent call last):
File "/home/.idea/dac_codec_fail.py", line 24, in <module>
synth = model.decompress(z) # <--- Error
File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/dac/model/base.py", line 289, in decompress
recons.audio_data = recons.audio_data.reshape(
RuntimeError: shape '[-1, 1, 58240]' is invalid for input of size 58234
Please note: setting win_duration=1.0
resolves this issue, but the model is substantially slower.
Metadata
Metadata
Assignees
Labels
No labels