jinxixiang
jinxixiang
Same question here. I am wondering whether the TPS transform is applicable to high resolution simply by increasing the number of key points and TPS.
I had the same issue. My server has no internet connection and this is painful. I found a workaround solution to this issue and may help you. **Step 1**: download...
Thank you for your help! The training loss and accuracy of masked prediction are attached. notes: - i2t_train_acc, t2i_train_acc: contrastive top1 acc - mim_image_train_acc: monomodal image acc - mim_train_acc: vl-ffn...
And the plot of **MIM + MLM loss: ( same as BEIT3)**
we set batch size = 1024. How does the contrastive loss on the VL-FFN help? since we only use the V-FFN and L-FFN to compute cosine similarity for retrieval.
ok, thank you for your advice. I followed the implementation of contrastive loss from VLMO. But maybe vl_i2t and vl_t2i are not the main reasons to prevent convergence? Also, I...
Thank you for your reply. _torchscale_ is a helpful toolkit for large model training, and we are happy to try it out later. But I suppose that the issue is...
I don't know whether the spatial palette mode in T2I adapter can fulfill your requirements. https://github.com/TencentARC/T2I-Adapter
> yes , we urgently need a control for color, img2img is not very good for color control because img2img not only influences the color of the output but also...
@loboere Sure, we plan to release a Webui demo later.