OmniParser icon indicating copy to clipboard operation
OmniParser copied to clipboard

Improve QEMU Resolution Could Improve the Performance

Open yliu9276 opened this issue 9 months ago • 2 comments

Some experience to share (or maybe I am sharing my hallucination) -- I feel like after improving the resolution of QEMU vm, and then scale out the screen (200%) did improve my test environment. Not sure if it's because the higher quality of screenshot improves the UI parsing and execution

yliu9276 avatar Apr 16 '25 18:04 yliu9276

However, if I want to click on a button, say the file attach button of gmail when I am writing an email, there is no text just an icon, the model is not able to click on this one. I feel like this is due the lacking the knowledge of the icon. Not sure if building better AI Agent could resolve this issue. Please suggest if you have any thoughts. Cannot be more appreciated

yliu9276 avatar Apr 16 '25 18:04 yliu9276

However, if I want to click on a button, say the file attach button of gmail when I am writing an email, there is no text just an icon, the model is not able to click on this one. I feel like this is due the lacking the knowledge of the icon. Not sure if building better AI Agent could resolve this issue. Please suggest if you have any thoughts. Cannot be more appreciated

Would you mind sharing your whole setup for running it? Everyone can't run this broken toy because one problem or another

paciox avatar Jul 28 '25 02:07 paciox