Yuxuan Yan
Yuxuan Yan
I ran quanvolution.py as it is, but for models with quanvolution, I only got very low accuracy as shown in the following figure. Can you tell me why? 
### Description of the feature request: Is there any documentation explaining how Gimini preprocesses input images or videos before generating tokens? For instance, how does it crop images of arbitrary...
How can randomness be mitigated during the testing of video-mme? Are there any specific hyperparameter settings for generating responses?
I noticed that NVILA has three versions: Base, Lite, and Video. What are the differences between them, and how does NVILA-15B perform in video tasks, such as the test results...
I am performing parallel inference with a batch size of 8 on a machine with 4 * A6000 GPUs. However, after running inference for a while, it gets stuck and...