Agent-S Question about "gpt 4o" for manager worker, grouding model for local

Hi @xyzhang626 , thank you for the support!

We use claude-3-7-sonnet-20250219 as our manager and worker, and use bytedance-research/UI-TARS-72B-DPO from HuggingFace as our grounding model. The hyperparams from our SOTA run (34.5% and 27%) are below.

Hyperparams for claude-3-7-sonnet-20250219:

Thinking mode enabled

budget_tokens = 4096 for thinking, max_tokens = 8192 overall

Anthropic API forces temperature = 1 when using thinking mode

All prompts can be found in gui_agents/s2/memory/procedural_memory.py

Hyperparams for bytedance-research/UI-TARS-72B-DPO:

temperature = 0

max_tokens = 128 since UI-TARS just generates (x, y)

HF model config is set following the UI-TARS HuggingFace deployment documentation here

Prompt format for UI-TARS: f"Query:{description_of_element}\nOutput only the coordinate of one point in your response.\n"

Originally posted by @kylesimular in #43

Hello, I would like to ask about the issue mentioned in the title. I am currently trying to run it this way, and I would like to know if I should execute it as follows:

Modify line 157 in /home/bychen/Agent-S/osworld_setup/s2/run.py to:

grounding_agent = OSWorldACI(
        platform="linux",
        engine_params_for_generation=engine_params,
        engine_params_for_grounding={
            "model": "UI-TARS-7B-DPO",
            "engine_type": "vllm",
            "endpoint_url": args.endpoint_url,
        },
    )

and then run the code with `python3 run.py --path_to_vm /Agent-S/osworld_setup/s2/vmware_vm_data/Ubuntu0/Ubuntu0.vmx --endpoint_url "http://127.0.0.1:8000/v1" --model "gpt-4o"

thanks

Apr 10 '25 07:04 richard28039

Hi @richard28039 ,

That looks correct to me. Maybe the model is bytedance-research/UI-TARS-7B-DPO? Not sure if that makes a difference. Let me know if there are any issues!

Apr 10 '25 23:04 alckasoc

Hi @alckasoc

I see. I didn't encounter any problems during my tests either, so I think I can let the worker and manager use a local model, and have the Grounding Model do another local model.

thanks

Apr 11 '25 08:04 richard28039