Question about "gpt 4o" for manager worker, grouding model for local
Hi @xyzhang626 , thank you for the support!
We use
claude-3-7-sonnet-20250219as our manager and worker, and usebytedance-research/UI-TARS-72B-DPOfrom HuggingFace as our grounding model. The hyperparams from our SOTA run (34.5% and 27%) are below.Hyperparams for
claude-3-7-sonnet-20250219:
- Thinking mode enabled
budget_tokens = 4096for thinking,max_tokens = 8192overall- Anthropic API forces
temperature = 1when using thinking mode- All prompts can be found in
gui_agents/s2/memory/procedural_memory.pyHyperparams for
bytedance-research/UI-TARS-72B-DPO:
temperature = 0max_tokens = 128since UI-TARS just generates (x, y)- HF model config is set following the UI-TARS HuggingFace deployment documentation here
- Prompt format for UI-TARS:
f"Query:{description_of_element}\nOutput only the coordinate of one point in your response.\n"
Originally posted by @kylesimular in #43
Hello, I would like to ask about the issue mentioned in the title. I am currently trying to run it this way, and I would like to know if I should execute it as follows:
Modify line 157 in /home/bychen/Agent-S/osworld_setup/s2/run.py to:
grounding_agent = OSWorldACI(
platform="linux",
engine_params_for_generation=engine_params,
engine_params_for_grounding={
"model": "UI-TARS-7B-DPO",
"engine_type": "vllm",
"endpoint_url": args.endpoint_url,
},
)
and then run the code with `python3 run.py --path_to_vm /Agent-S/osworld_setup/s2/vmware_vm_data/Ubuntu0/Ubuntu0.vmx --endpoint_url "http://127.0.0.1:8000/v1" --model "gpt-4o"
thanks
Hi @richard28039 ,
That looks correct to me. Maybe the model is bytedance-research/UI-TARS-7B-DPO? Not sure if that makes a difference. Let me know if there are any issues!
Hi @alckasoc
I see. I didn't encounter any problems during my tests either, so I think I can let the worker and manager use a local model, and have the Grounding Model do another local model.
thanks