xiaotingxuan
xiaotingxuan
when we load clip model, eg model_1, preprocess = clip.load("RN50", device=device, jit=False) model_2, preprocess = clip.load("ViT-B/16", device=device, jit=False) Obviously, the image encoders in model_1 and model_2 are different(ResNet and ViT),...
Hi , I am a greenhorn for diffusion model According to Dalle2 paper,Prior model is used to predict clip image embeddings from clip text embeddings. I think they design this...
Hi,I am a greenhorn in diffusion model I find something strange when I use diffusion prior model to generate image embedding. First , I set prior_cond_scale = 2. and sample...
Hello,Thanks for sharing the data. Could you please tell me the method used to generate the video captions within the WebVid dataset. Please provide some insights into whether the captions...
Hello , thanks for sharing your code, it is really helpful. I notice there is a hyperparameter top-p, the code is [here](https://github.com/Shark-NLP/DiffuSeq/blob/8bfafcbb26df218073b8117234afb9de9dfcbec9/diffuseq/gaussian_diffusion.py#L381-390). When we run decode, this hyperparameter is set...