Seungwoo, Jeong
Results
3
issues of
Seungwoo, Jeong
in the BLIP-2 paper, "We propose Q-Former as the trainable module to bridge the gap between a frozen image encoder and a frozen LLM. It extracts a fixed number of...
The most surprising part of AudioLDM2 was the results of converting images to audio. Will this be a future release?
How much VRAM need for inference? And can you recommend minimum specific GPU for generating videos? Thanks for open-sourcing this!