Rhythmic Artifacts in Piano Audio After DAC Compression and Reconstruction
Rhythmic Artifacts in Piano Audio After DAC Compression and Reconstruction
Hello,
I hope this message finds you well. This is an amazing project! However, I've encountered an issue while working with it, and I'd appreciate your insights.
Issue Description
When processing a piano performance audio using DAC, I've noticed consistent rhythmic artifacts in the compressed and reconstructed audio. These artifacts are present regardless of the sampling rate used (16k, 24k, or 44k).
The artifacts are particularly noticeable at the beginning of the piano sound in the processed audio. They are most prominent in the 16k version, but can be heard in all versions to some extent. The artifacts can be observed on spectrum like this:
Steps to Reproduce
- Input a high-quality piano performance audio file.
- Process the audio using DAC with various sampling rates (16k, 24k, 44k).
- Listen to the output, paying particular attention to the beginning of piano sounds.
You can download the original piano audio file and processed (by DAC) files from Google Driver link: https://drive.google.com/file/d/1FyzoRfjviTFLmsX_7x9_a_MlXMdSfm-L/view?usp=drive_link or Baidu Driver link: https://pan.baidu.com/s/1kC2wnsl_dl9mY0zKLJz5Jw?pwd=iycc 提取码: iycc
Code Used
Here's the code I used for processing:
import dac
from audiotools import AudioSignal
import torch
def process_audio(input_file, output_file, target_sr=44100, target_channels=1, use_cuda=False):
model_path = dac.utils.download(model_type="16khz")
model = dac.DAC.load(model_path)
device = 'cuda' if use_cuda and torch.cuda.is_available() else 'cpu'
model.to(device)
# Load audio signal file
signal = AudioSignal(input_file)
print(f"Original audio - Sample rate: {signal.sample_rate}, Channels: {signal.audio_data.shape[0]}")
if signal.sample_rate != target_sr or signal.audio_data.shape[0] != target_channels:
signal = signal.resample(target_sr).to_mono() if target_channels == 1 else signal.resample(target_sr).to_stereo()
print(f"Processed audio - Sample rate: {signal.sample_rate}, Channels: {signal.audio_data.shape[0]}")
signal = signal.to(device)
x = model.compress(signal)
y = model.decompress(x)
y.write(output_file)
print(f"Processed audio saved to {output_file}")
input_file = '../WavTokenizer/demo_mp4/yundi.wav'
output_file = './infer_out/yundi_dac_16k.wav'
target_sr = 16000
target_channels = 1
use_cuda = False
process_audio(input_file, output_file, target_sr, target_channels, use_cuda)
Additional Information
- I've uploaded audio samples demonstrating the issue. The artifacts are most noticeable in the 16k version.
- The original audio is a high-quality recording of a piano performance.
- The issue persists across different sampling rates (16k, 24k, 44k).
Questions
- Is this a known issue with DAC when processing piano audio?
- Are there any recommended settings or preprocessing steps to mitigate these artifacts?
- Could this be related to the model used or the compression settings?
I appreciate any guidance or insights you can provide on this matter. Thank you for your time and assistance.
Best regards, Tao
Hello, did you find any solution to your problem?
Thanks.