Moisés Horta Valenzuela issues

Results 10 issues of


                                            Moisés Horta Valenzuela

Upscaling task

Hello, Thanks for this great work. I'm wondering if you could provide instructions on how to perform the Upscaling task? Thanks!

Feature Request: Reinitializing Prior model when running with nn~

Hello, This is less of an issue but more of a feature request. I've found that controlling the Prior guided generation in realtime tends to produce either really good results...

Transfer Learning using pre-trained checkpoints

Hi, I'm giving WaveGAN another go this year. Mainly, I'm wondering if it's possible to save compute time and perform transfer learning from the provided pre-trained checkpoints? I've tried directing...

FR: Unlimited length audio+text conditioning with generate_with_chroma()

Hi, I've got a script going which takes an input audio, crops it into 30 second chunks, passes each one consecutively to generate_with_chroma() function and then concatenates the results. Even...

Learned latent dimension changes from training phase 1 to 2

Hi! I've been successfully training a new RAVEv2 model. What I noticed is that when training a model, the learned latent dimension changes radically from training phase 1 to 2....

Training time when training from scratch & recommended dataset size

Hello, Thanks for open-sourcing this work, it's very valuable for the increase in understanding for how denoising diffusion models behave in the domain of audio. I've started a new training...

Feature request: Add AudioLDM_48Khz to DreamSound implementation

Hello, Thanks for this great repo, been having a lot of fun with it. I'm wondering if it's possible to implement the newer ``audioldm_48khz`` checkpoint for finetuning? It seems it...

Fine-tuning from 44.1Khz

Hello, Thanks for this work! I noticed that the pre-trained 44.1Khz weights isn't doing the best job at reconstruction some music outside of the dataset scope. I'm wondering if you're...

Feature request: Switch off Unet for DiT

Hello, I've been reading a lot of the SOTA papers on audio and video generation using Rectified Flows, and it seems most are using Transformers instead of Unets. Are there...

Feature Request: Text (and other modality) conditioning + CFG

Hello, Thanks so much for open sourcing the code. I have been training an unconditioned RF model on audio latents, with really good quality results. Here's some audio examples: https://drive.google.com/file/d/169NMzxl0k5X8oqiadNs3e7sjlxz8V5Pk/view?usp=sharing...