Matagi1996
Matagi1996
To achieve receptive field the window partition switches each second block inside a stage, this way information between the chunks of window divided tokens can be exchanged slowly with each...
You could log your actual disk read/write speed and see if your Dataloaders are IO bound, i had this issue as well when loading big images from local drive while...
Hey, thank you very much for the tipp with Layernorms beeing the reason results are not good in fp16. I wrote a little script using forward hooks to convert between...
Had the same problem of wanting to use Liger Kernels with torch.compile I followed the flash attn repo/torch doc to register Liger Implementation of RMS_Norm / Swiglu (needed thouse 2)...
The automatic mask predictor is sampling a grid of points and calling the decoder again and again, I have actually tried this with the Onnx model (not implemented here) but...
The Hiera MAE Decoder Merges Every Block Output with a 2D conv to same (HW) and C of decoder Dimension while staying inside the Mask units and just sums them...
Braindead AI regulation regarding Data, Safety, (you name it) is (most likely) responsible. You could vote out your gouvernments, move to another country or just use the models and Tencent...
Thank you for the reply. Even when wrapping it in the Editor value, it will show the Image but have the "Upload Image" still in the background and the cropping...
Does it even make sense to use different Layer outputs of the Backbone for Finetuning? My Intuition for VITs was that the feature maps are refined layer by layer instead...
I tested it on some own sample tasks with Qwen3VL-8B (as in the notebook) The implementation seems to work sometimes, but is realy brittle in reality. The system prompt, a...