TokenPacker
TokenPacker copied to clipboard
The code for "TokenPacker: Efficient Visual Projector for Multimodal LLM", IJCV2025
Hello, I would like to know if the inference times reported in Figure 4 are measured under NO KV cache? While the "TPS" results in Table 3 are prefill time...
Hello, great work. I encountered a problem in the core code: ` File "/tmp/pycharm_project_858/m_llava/model/multimodal_projector/builder.py", line 112, in forward key = self.ln_k_1(self.k_proj_1(x_multi)).permute(1, 0, 2)` `RuntimeError: mat1 and mat2 shapes cannot be...
As a general visual projector, I'd like to ask whether you have conducted any experiments on other visual backbone. I extracted features from the siglip [17, 18, 26, 27] layers...
I appreciate your work. Would you release the checkpoints of TokenPacker-7b-36token and TokenPacker-7b-64token?
能给个单独使用TokenPacker得用例吗?
Hi, thank you for your great work on TokenPacker! I’m trying to reproduce the **TokenPacker-HD (7B, scale factor 2, patch number 9)** experiments, but I’m not getting results close to...
Hello author, I have reproduced your paper, but when using your 7B+144Token model on the VQAv2 dataset, I obtained a result of 77.12, which significantly differs from the 77.9 reported...