SeeClick icon indicating copy to clipboard operation
SeeClick copied to clipboard

The model, data and code for the visual GUI Agent SeeClick

Results 10 SeeClick issues
Sort by recently updated
recently updated
newest added

Thanks for the great work! @njucckevin I tried reproducing SeeClick's performances on AITW and Mind2WEb but encountered a problem. After finetuing Qwen-VL with the 1M data mentioned in your paper,...

Hi, thank you very much for your interesting work! I have a question, how does LLM perform a series of web page operations after judging the location of the element...

So, from the provided paper only various click actions accross different domains have been shown? What about sliders and swipes? These do need a from (x, y) and to (x,...

When I load the model in 4bit: ```python tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen-VL-Chat", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained("cckevinn/SeeClick", device_map="auto", trust_remote_code=True, load_in_4bit=True, do_sample=True, temperature=1e-3).eval() model.generation_config = GenerationConfig.from_pretrained("Qwen/Qwen-VL-Chat", trust_remote_code=True) ``` I get the following error...

Thanks for your working. What kind of computing resources does seeclick need when sft is performed on downstream tasks (i.e. Mind2Web)? I tried to sft seeclick on 1*A100, even though...

Thank you for your wonderful work! I encountered some problems during fine-tuning mind2web data using the SeeClick ckpt you released. The problem occurs when I try to test the fine-tuned...

Hi, I am trying to compare models using ScreenSpot. What were the prompts you used for QwenVL, Fuyu, and CogAgent?

作者你好, 非常感谢您的工作和数据集,想咨询一下你们在评估QwenVL在sequential Action task的上的表现时是如何构造模型推理时的prompt/instruction呢,感觉直接通过aitw_test.py的代码测试QwenVL,会出现输出格式和action space的定义的格式不同,无法正常评估模型的性能。请问您们在评测时有限制输出格式相关的prompt吗? 下图为本地推理时,qwenVL的输入与输出:

Hi, thank you very much for your work! I wander whether SeeClick is trained on ScreenSpot? I know SeeClick is first pre-trained on "GUI Grounding Pre-training Data" and also fine-tuned...

I need to call the SeeClick model deployed on a server using an API interface, but an error occurred when deploying the model with vLLM. Maybe there is a way...