Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rhythmic Artifacts in Piano Audio After DAC Compression and Reconstruction #92

Open
fff-ttt opened this issue Sep 14, 2024 · 0 comments

Comments

@fff-ttt
Copy link

fff-ttt commented Sep 14, 2024

Rhythmic Artifacts in Piano Audio After DAC Compression and Reconstruction

Hello,

I hope this message finds you well. This is an amazing project!
However, I've encountered an issue while working with it, and I'd appreciate your insights.

Issue Description

When processing a piano performance audio using DAC, I've noticed consistent rhythmic artifacts in the compressed and reconstructed audio. These artifacts are present regardless of the sampling rate used (16k, 24k, or 44k).

The artifacts are particularly noticeable at the beginning of the piano sound in the processed audio. They are most prominent in the 16k version, but can be heard in all versions to some extent. The artifacts can be observed on spectrum like this:
7821726664333_ pic

Steps to Reproduce

  1. Input a high-quality piano performance audio file.
  2. Process the audio using DAC with various sampling rates (16k, 24k, 44k).
  3. Listen to the output, paying particular attention to the beginning of piano sounds.

You can download the original piano audio file and processed (by DAC) files from Google Driver link:
https://drive.google.com/file/d/1FyzoRfjviTFLmsX_7x9_a_MlXMdSfm-L/view?usp=drive_link
or Baidu Driver link:
https://pan.baidu.com/s/1kC2wnsl_dl9mY0zKLJz5Jw?pwd=iycc 提取码: iycc

Code Used

Here's the code I used for processing:

import dac
from audiotools import AudioSignal
import torch

def process_audio(input_file, output_file, target_sr=44100, target_channels=1, use_cuda=False):
    model_path = dac.utils.download(model_type="16khz")
    model = dac.DAC.load(model_path)
    device = 'cuda' if use_cuda and torch.cuda.is_available() else 'cpu'
    model.to(device)
    # Load audio signal file
    signal = AudioSignal(input_file)
    print(f"Original audio - Sample rate: {signal.sample_rate}, Channels: {signal.audio_data.shape[0]}")
    if signal.sample_rate != target_sr or signal.audio_data.shape[0] != target_channels:
        signal = signal.resample(target_sr).to_mono() if target_channels == 1 else signal.resample(target_sr).to_stereo()    
    print(f"Processed audio - Sample rate: {signal.sample_rate}, Channels: {signal.audio_data.shape[0]}")
    signal = signal.to(device)
    x = model.compress(signal)
    y = model.decompress(x)
    y.write(output_file)
    print(f"Processed audio saved to {output_file}")

input_file = '../WavTokenizer/demo_mp4/yundi.wav'
output_file = './infer_out/yundi_dac_16k.wav'
target_sr = 16000
target_channels = 1
use_cuda = False

process_audio(input_file, output_file, target_sr, target_channels, use_cuda)

Additional Information

  • I've uploaded audio samples demonstrating the issue. The artifacts are most noticeable in the 16k version.
  • The original audio is a high-quality recording of a piano performance.
  • The issue persists across different sampling rates (16k, 24k, 44k).

Questions

  1. Is this a known issue with DAC when processing piano audio?
  2. Are there any recommended settings or preprocessing steps to mitigate these artifacts?
  3. Could this be related to the model used or the compression settings?

I appreciate any guidance or insights you can provide on this matter. Thank you for your time and assistance.

Best regards,
Tao

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

1 participant