audio-diffusion
audio-diffusion copied to clipboard
Apply diffusion models using the new Hugging Face diffusers package to synthesize music instead of images.
Do you have any future plans for training flow based diffusion transformer models like [Make an Audio 3](https://github.com/Text-to-Audio/Make-An-Audio-3) or simple Vall E typle models ?
I would like to use this project to generate audio data similar to the training sample, so as to expand the data set. The data is divided into 5s segments....
I am trying to train an autoencoder for Latent diffusion. I run: ` python scripts/train_vae.py --dataset_name data/physio22/mel_res_64 --batch_size 1 --gradient_accumulation_steps 1 --hop_length 1024 --max_epochs 5` And get the following error...