SeeClick issues

How to continue fine-tuning the SeeClick with Lora on Mind2WEb/AITW data?

1

Thanks for the great work! @njucckevin I tried reproducing SeeClick's performances on AITW and Mind2WEb but encountered a problem. After finetuing Qwen-VL with the 1M data mentioned in your paper,...

ZJULiHongxin

multi-step operations

1

Hi, thank you very much for your interesting work! I have a question, how does LLM perform a series of web page operations after judging the location of the element...

LiuJinzhe-Keepgoing

How to handle sliders and swipes?

1

So, from the provided paper only various click actions accross different domains have been shown? What about sliders and swipes? These do need a from (x, y) and to (x,...

DanielProkhorov

When I load the model in 4bit: ```python tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen-VL-Chat", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained("cckevinn/SeeClick", device_map="auto", trust_remote_code=True, load_in_4bit=True, do_sample=True, temperature=1e-3).eval() model.generation_config = GenerationConfig.from_pretrained("Qwen/Qwen-VL-Chat", trust_remote_code=True) ``` I get the following error...

DanielProkhorov

sft on downstream tasks

8

Thanks for your working. What kind of computing resources does seeclick need when sft is performed on downstream tasks (i.e. Mind2Web)? I tried to sft seeclick on 1*A100, even though...

ghost

Issue about saving checkpoint

1

Thank you for your wonderful work! I encountered some problems during fine-tuning mind2web data using the SeeClick ckpt you released. The problem occurs when I try to test the fine-tuned...

CheungKaKei

Prompts for other models

3

Hi, I am trying to compare models using ScreenSpot. What were the prompts you used for QwenVL, Fuyu, and CogAgent?

likaixin2000

aitw数据集在Qwen VL上的评测结果

12

作者你好，非常感谢您的工作和数据集，想咨询一下你们在评估QwenVL在sequential Action task的上的表现时是如何构造模型推理时的prompt/instruction呢，感觉直接通过aitw_test.py的代码测试QwenVL，会出现输出格式和action space的定义的格式不同，无法正常评估模型的性能。请问您们在评测时有限制输出格式相关的prompt吗？下图为本地推理时，qwenVL的输入与输出：

jiennyteng

Is SeeClick trained on ScreenSpot?

1

Hi, thank you very much for your work! I wander whether SeeClick is trained on ScreenSpot? I know SeeClick is first pre-trained on "GUI Grounding Pre-training Data" and also fine-tuned...

13958806684

How to deploy the SeeClick model using vLLM

1

I need to call the SeeClick model deployed on a server using an API interface, but an error occurred when deploying the model with vLLM. Maybe there is a way...

1NormalGuy

SeeClick
SeeClick copied to clipboard

Metadata

How to continue fine-tuning the SeeClick with Lora on Mind2WEb/AITW data?

multi-step operations

How to handle sliders and swipes?

How to inference in 4bit?

sft on downstream tasks

Issue about saving checkpoint

Prompts for other models

aitw数据集在Qwen VL上的评测结果

Is SeeClick trained on ScreenSpot?

How to deploy the SeeClick model using vLLM

← Metadata

Owner

Metadata

SeeClick SeeClick copied to clipboard

Metadata

← Metadata

Owner

Metadata

SeeClick
SeeClick copied to clipboard