Jorge C. Gomes comments

Results 19 comments of


                                            Jorge C. Gomes

An example for finetuning FLAVA or any VLP multimodel using trainer (for example for classification)

The issue was auto marked as closed, but there aren't yet any resources on how to fine-tune FLAVA. Neither of the links posted above by @NielsRogge have instructions on fine-tuning....

Feasibility of optimizations like using GPT RETRO / sub-quadratic attention?

It is a bit puzzling to me why T5 is a crucial component of Imagen. It probably isn't? It's probably just the size and dimension of the encoder that matters....

Feasibility of optimizations like using GPT RETRO / sub-quadratic attention?

> * Can we modify the space of captions and still get good results? It's hard to generate captions for images, but..., it's easy to (for example) generate tags for...

dbnet18 is 50x slower than CRAFT on CPU

I had to manually compile (using the provided script), because for some reason, it wasn't being compiled on demand, and I got an error of missing libraries. In any case,...

Negative prompts

The negative prompt is simply the prompt that is used for the "unconditional" generation in Classifier-Free Guidance. In this implementation it is hardcoded to be an empty string (or rather,...

The clip-vit-large-patch14 (https://huggingface.co/openai/clip-vit-large-patch14) model used by SD can only handle sequences of 77 tokens. It works like that in the original pytorch implementation as well. Anything longer than that gets...

add PAG support

Just leaving a brief report of my findings with PAG and Diffusers (I already had it integrated in my pipelines before this PR): - It generally works very very well...

Kohya Hires fix

Looks very interesting. Could this theoretically be used in the opposite way, to generate smaller images? When trying to generate small images with models that have been trained with high-res...

[Pipeline] Extending Stable Diffusion for generating videos

@Abhinay1997 FYI, some findings based on my own experiments with Tune-A-Video: 1. using prior preservation loss (as implemented in https://github.com/bryandlee/Tune-A-Video/blob/main/train.py) helps a lot with the relevance of the output videos...

[Pipeline] Extending Stable Diffusion for generating videos

@sayakpaul Yes, definitely. I'll keep an eye on the PR that @Abhinay1997 will open 👍