MoayedHajiAli issues

Results 10 issues of


                                            MoayedHajiAli

No output in the cmd

Hello, I have successfully run the simulator on my windows machine and the default program generated the trace file successfully. However, when running the hello-simulator or first project, I do...

Classifier-Free Guidance Formulation

In the classifier-free guidance paper, the formulation is as follow ![image](https://github.com/lucidrains/DALLE2-pytorch/assets/52598644/85761e78-c71a-474e-86c7-fe56d83ba28e) However, it is implemented in DALL-2 as `null_logits + (logits - null_logits) * cond_scale` According to the formula it...

Plans to release AudioLDM2 training code?

Hello, Are you planing to release the training code for AudioLDM2? The current repo has a training code only for AudioLDM1. Thank you!

Cosine scheduler gives much worse results

I noticed that you have implemented cosine scheduler in your codebase. However, when training with it, the model performs poorly. I am wondering if you have similar experience when testing...

Make-An-Audio 2 1D VAE

Hello, I noticed that in the Make-an-aduio 2 paper, you have not reported the reconstruction loss performance of your trained 1D VAE in comparison with the 2D one. I am...

Unstable Metric

Hello, I am training the HTSAT-BART captioning model on AudioCaps only as a baseline. The metrics almost matches those reported in the paper (Spider: 44.1, Cider: 71.1) on the validation...

Evaluation Protocol

Hello, I noticed in your code that you have n_candidate_per_text to be set to 3 by default. I am wondering if that was used during the evaluation as it was...

Adding AutoReCap

Hello, Can you please add AutoReCap to the list of sound effects datasets, which provides over 47M audio-text pairs. You may find the details below: Paper: https://arxiv.org/abs/2406.19388 Project Page: https://snap-research.github.io/GenAU/dataset.html...

Overfitting

Hello, I noticed that the released checkpoint exhibit a huge degree of overfitting. The generated videos with the released checkpoint often come from the training dataset. Here is an example...

Fix wan T2V inference with larger batchsize than 1

Currently Wan2 T2V inference fails on batchsize larger than 1 due to 1. Incompatiable shape between the time conditioning and the modulation tensor 2. A bug in the text encoder...