Rhythmic Artifacts in Piano Audio After DAC Compression and Reconstruction #92

fff-ttt · 2024-09-14T16:40:43Z

Rhythmic Artifacts in Piano Audio After DAC Compression and Reconstruction

Hello,

I hope this message finds you well. This is an amazing project!
However, I've encountered an issue while working with it, and I'd appreciate your insights.

Issue Description

When processing a piano performance audio using DAC, I've noticed consistent rhythmic artifacts in the compressed and reconstructed audio. These artifacts are present regardless of the sampling rate used (16k, 24k, or 44k).

The artifacts are particularly noticeable at the beginning of the piano sound in the processed audio. They are most prominent in the 16k version, but can be heard in all versions to some extent. The artifacts can be observed on spectrum like this:

Steps to Reproduce

Input a high-quality piano performance audio file.
Process the audio using DAC with various sampling rates (16k, 24k, 44k).
Listen to the output, paying particular attention to the beginning of piano sounds.

You can download the original piano audio file and processed (by DAC) files from Google Driver link:
https://drive.google.com/file/d/1FyzoRfjviTFLmsX_7x9_a_MlXMdSfm-L/view?usp=drive_link
or Baidu Driver link:
https://pan.baidu.com/s/1kC2wnsl_dl9mY0zKLJz5Jw?pwd=iycc 提取码: iycc

Code Used

Here's the code I used for processing:

import dac
from audiotools import AudioSignal
import torch

def process_audio(input_file, output_file, target_sr=44100, target_channels=1, use_cuda=False):
    model_path = dac.utils.download(model_type="16khz")
    model = dac.DAC.load(model_path)
    device = 'cuda' if use_cuda and torch.cuda.is_available() else 'cpu'
    model.to(device)
    # Load audio signal file
    signal = AudioSignal(input_file)
    print(f"Original audio - Sample rate: {signal.sample_rate}, Channels: {signal.audio_data.shape[0]}")
    if signal.sample_rate != target_sr or signal.audio_data.shape[0] != target_channels:
        signal = signal.resample(target_sr).to_mono() if target_channels == 1 else signal.resample(target_sr).to_stereo()    
    print(f"Processed audio - Sample rate: {signal.sample_rate}, Channels: {signal.audio_data.shape[0]}")
    signal = signal.to(device)
    x = model.compress(signal)
    y = model.decompress(x)
    y.write(output_file)
    print(f"Processed audio saved to {output_file}")

input_file = '../WavTokenizer/demo_mp4/yundi.wav'
output_file = './infer_out/yundi_dac_16k.wav'
target_sr = 16000
target_channels = 1
use_cuda = False

process_audio(input_file, output_file, target_sr, target_channels, use_cuda)

Additional Information

I've uploaded audio samples demonstrating the issue. The artifacts are most noticeable in the 16k version.
The original audio is a high-quality recording of a piano performance.
The issue persists across different sampling rates (16k, 24k, 44k).

Questions

Is this a known issue with DAC when processing piano audio?
Are there any recommended settings or preprocessing steps to mitigate these artifacts?
Could this be related to the model used or the compression settings?

I appreciate any guidance or insights you can provide on this matter. Thank you for your time and assistance.

Best regards,
Tao

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rhythmic Artifacts in Piano Audio After DAC Compression and Reconstruction #92

Rhythmic Artifacts in Piano Audio After DAC Compression and Reconstruction #92

fff-ttt commented Sep 14, 2024 •

edited

Loading

Rhythmic Artifacts in Piano Audio After DAC Compression and Reconstruction #92

Rhythmic Artifacts in Piano Audio After DAC Compression and Reconstruction #92

Comments

fff-ttt commented Sep 14, 2024 • edited Loading