OmniParser Improve QEMU Resolution Could Improve the Performance

Some experience to share (or maybe I am sharing my hallucination) -- I feel like after improving the resolution of QEMU vm, and then scale out the screen (200%) did improve my test environment. Not sure if it's because the higher quality of screenshot improves the UI parsing and execution

Apr 16 '25 18:04 yliu9276

However, if I want to click on a button, say the file attach button of gmail when I am writing an email, there is no text just an icon, the model is not able to click on this one. I feel like this is due the lacking the knowledge of the icon. Not sure if building better AI Agent could resolve this issue. Please suggest if you have any thoughts. Cannot be more appreciated

Apr 16 '25 18:04 yliu9276

However, if I want to click on a button, say the file attach button of gmail when I am writing an email, there is no text just an icon, the model is not able to click on this one. I feel like this is due the lacking the knowledge of the icon. Not sure if building better AI Agent could resolve this issue. Please suggest if you have any thoughts. Cannot be more appreciated

Would you mind sharing your whole setup for running it? Everyone can't run this broken toy because one problem or another

Jul 28 '25 02:07 paciox