MoayedHajiAli
MoayedHajiAli
Hello, I have successfully run the simulator on my windows machine and the default program generated the trace file successfully. However, when running the hello-simulator or first project, I do...
In the classifier-free guidance paper, the formulation is as follow  However, it is implemented in DALL-2 as `null_logits + (logits - null_logits) * cond_scale` According to the formula it...
Hello, Are you planing to release the training code for AudioLDM2? The current repo has a training code only for AudioLDM1. Thank you!
I noticed that you have implemented cosine scheduler in your codebase. However, when training with it, the model performs poorly. I am wondering if you have similar experience when testing...
Hello, I noticed that in the Make-an-aduio 2 paper, you have not reported the reconstruction loss performance of your trained 1D VAE in comparison with the 2D one. I am...
Hello, I am training the HTSAT-BART captioning model on AudioCaps only as a baseline. The metrics almost matches those reported in the paper (Spider: 44.1, Cider: 71.1) on the validation...
Hello, I noticed in your code that you have n_candidate_per_text to be set to 3 by default. I am wondering if that was used during the evaluation as it was...
Hello, Can you please add AutoReCap to the list of sound effects datasets, which provides over 47M audio-text pairs. You may find the details below: Paper: https://arxiv.org/abs/2406.19388 Project Page: https://snap-research.github.io/GenAU/dataset.html...
Hello, I noticed that the released checkpoint exhibit a huge degree of overfitting. The generated videos with the released checkpoint often come from the training dataset. Here is an example...
Currently Wan2 T2V inference fails on batchsize larger than 1 due to 1. Incompatiable shape between the time conditioning and the modulation tensor 2. A bug in the text encoder...