descript-audio-codec icon indicating copy to clipboard operation
descript-audio-codec copied to clipboard

Rhythmic Artifacts in Piano Audio After DAC Compression and Reconstruction

Open fff-ttt opened this issue 1 year ago • 1 comments

Rhythmic Artifacts in Piano Audio After DAC Compression and Reconstruction

Hello,

I hope this message finds you well. This is an amazing project! However, I've encountered an issue while working with it, and I'd appreciate your insights.

Issue Description

When processing a piano performance audio using DAC, I've noticed consistent rhythmic artifacts in the compressed and reconstructed audio. These artifacts are present regardless of the sampling rate used (16k, 24k, or 44k).

The artifacts are particularly noticeable at the beginning of the piano sound in the processed audio. They are most prominent in the 16k version, but can be heard in all versions to some extent. The artifacts can be observed on spectrum like this: 7821726664333_ pic

Steps to Reproduce

  1. Input a high-quality piano performance audio file.
  2. Process the audio using DAC with various sampling rates (16k, 24k, 44k).
  3. Listen to the output, paying particular attention to the beginning of piano sounds.

You can download the original piano audio file and processed (by DAC) files from Google Driver link: https://drive.google.com/file/d/1FyzoRfjviTFLmsX_7x9_a_MlXMdSfm-L/view?usp=drive_link or Baidu Driver link: https://pan.baidu.com/s/1kC2wnsl_dl9mY0zKLJz5Jw?pwd=iycc 提取码: iycc

Code Used

Here's the code I used for processing:

import dac
from audiotools import AudioSignal
import torch

def process_audio(input_file, output_file, target_sr=44100, target_channels=1, use_cuda=False):
    model_path = dac.utils.download(model_type="16khz")
    model = dac.DAC.load(model_path)
    device = 'cuda' if use_cuda and torch.cuda.is_available() else 'cpu'
    model.to(device)
    # Load audio signal file
    signal = AudioSignal(input_file)
    print(f"Original audio - Sample rate: {signal.sample_rate}, Channels: {signal.audio_data.shape[0]}")
    if signal.sample_rate != target_sr or signal.audio_data.shape[0] != target_channels:
        signal = signal.resample(target_sr).to_mono() if target_channels == 1 else signal.resample(target_sr).to_stereo()    
    print(f"Processed audio - Sample rate: {signal.sample_rate}, Channels: {signal.audio_data.shape[0]}")
    signal = signal.to(device)
    x = model.compress(signal)
    y = model.decompress(x)
    y.write(output_file)
    print(f"Processed audio saved to {output_file}")

input_file = '../WavTokenizer/demo_mp4/yundi.wav'
output_file = './infer_out/yundi_dac_16k.wav'
target_sr = 16000
target_channels = 1
use_cuda = False

process_audio(input_file, output_file, target_sr, target_channels, use_cuda)

Additional Information

  • I've uploaded audio samples demonstrating the issue. The artifacts are most noticeable in the 16k version.
  • The original audio is a high-quality recording of a piano performance.
  • The issue persists across different sampling rates (16k, 24k, 44k).

Questions

  1. Is this a known issue with DAC when processing piano audio?
  2. Are there any recommended settings or preprocessing steps to mitigate these artifacts?
  3. Could this be related to the model used or the compression settings?

I appreciate any guidance or insights you can provide on this matter. Thank you for your time and assistance.

Best regards, Tao

fff-ttt avatar Sep 14 '24 16:09 fff-ttt

Hello, did you find any solution to your problem?

Thanks.

tarlanahad avatar Feb 26 '25 13:02 tarlanahad