Kang-Jun(KJ) Liu
Kang-Jun(KJ) Liu
Hi, @danthe3rd Thank you for the response. My device is A6000, and I'm also interested in backward. I took some time to go through Flash-Attention (and Triton) and your proposal....
Hi, @alexcbb @xwen99 I have just conducted a small-scale experiment with ViT-S on COCO for 100 epochs. The rest of the settings can be found below. Prototype visualization makes sense...
> > Hi, @alexcbb @xwen99 > > I have just conducted a small-scale experiment with ViT-S on COCO for 100 epochs. The rest of the settings can be found below....
Hi, I did only the modifications listed above. I consider projectors to have a higher dependency on pre-trained methods rather than backbone architecture, so I keep the batch norm layer.