Seungwoo, Jeong

Results 3 issues of Seungwoo, Jeong

in the BLIP-2 paper, "We propose Q-Former as the trainable module to bridge the gap between a frozen image encoder and a frozen LLM. It extracts a fixed number of...

The most surprising part of AudioLDM2 was the results of converting images to audio. Will this be a future release?

How much VRAM need for inference? And can you recommend minimum specific GPU for generating videos? Thanks for open-sourcing this!