Jaewoo Ahn
Jaewoo Ahn
We tried to reproduce the baselines for the NLVR2 task. But our result was off by a visible margin. ### Hardware Specifications Graphic Card : Quadro RTX 6000 CUDA version...
Thanks for setting up this repository! I would like to add our paper, **FlashAdventure: A Benchmark for GUI Agents Solving Full Story Arcs in Diverse Adventure Games**, to this list....
Hi, thanks for your great work! I’d like to suggest adding the following paper to the list: Can LLMs Deceive CLIP? Benchmarking Adversarial Compositionality of Pre-trained Multimodal Representation via Text...