Hongwei Niu
Hongwei Niu
### Question Hello, I found that it, like CLIP, focuses on a very coarse granularity. For example, an image cropped from the GT box of the COCO dataset is recognized...
Can you share some implementation details about the result about 'K-Means Clustering of Frozen Features'?
Hello, I've been working on a project involving object detection, and I encountered a specific issue I'd like to discuss. My approach involves using RoIalign to extract regional features from...
你好,我想知道torch_vertex.py 146行,y代表什么,有什么作用呢? ``` if self.r > 1: y = F.avg_pool2d(x, kernel_size=self.r, stride=self.r) # [B, out_dim, H/r, W/r] y = y.reshape(B, C, -1, 1).contiguous() # [B, out_dim, H/r*W/r, 1] ```
Hi, I'm a beginner and would like to ask a question. What do Pair, L, T stand for in the code? What do they mean? ``` # Pair x L...
作者,您好。请问对于同义词你是怎么处理的呢?像ODISE、FC-CLIP的做法是在预测的时候进行max ensemble,但是这样的操作特别耗时,会影响FPS。 例如,num_templates存储的是每个类别的同义词个数,它会将每个类别的所有同义词模板进行max操作,这样得到的final_pred_logits维度为[B, N, num_classes] ``` cur_idx = 0 for num_t in num_templates: final_pred_logits.append(pred_logits[:, :, cur_idx: cur_idx + num_t].max(-1).values) cur_idx += num_t ```
CLIP will recognize this image as a hot dog with a very high probability close to 1, but the actual label should be a person. Is there a solution? 
hi, I have a problem: ```configs/faceshq_vqgan.yaml -t True --gpus 0, Running on GPUs 0, Working with z of shape (1, 256, 16, 16) = 65536 dimensions. loaded pretrained LPIPS loss...
Why is only one category displayed in the input pred label file?
(1) When computing category codes, src is support feature, why do src, category_code and tsp need to do QSAttn? It's not mentioned in the paper. (2) Why do QSAttn only...