Daisuke134
Daisuke134
@KBB99 Thank you! I was trying to implement SoM, but the accuracy of labels for screenshots were quite low, so I was trying to look into other ways of solving...
@KBB99 Looks great! I was looking into your code, and here are some thoughts I had. ・I think you are using summary_prompt to make GPT-4 respond with the label for...
@KBB99 Hi. I was trying to test out your code, but had some issues installing lang_sam. I am using Python 3.12 and tried to install torch and torchvision but was...
@joshbickett @KBB99 Thank you. I could run the code using 3.9.18. However, when I did operate and "Go to youtube.com and play some holiday music", there is an error saying...
> run `pip install torchvision ` separate before `lang-segment-anything` Meaning doing "pip install torchvision" before "pip install -U git+https://github.com/luca-medeiros/lang-segment-anything.git"? Also, I am still having the error with parsing JSON.. I...
@michaelhhogue Thank you so much. I will check out the scrolling action🙇♂️
How about applying a dynamic grid approach to enhance click accuracy? For example, we could adjust the grid density based on the proximity to the cursor. The areas closer to...
I am trying to implement SoM, since it seems to have the best accuracy.
I have been testing out SoM and seems pretty good. Here is the screenshot.. I will try adding this today, test it, and make PR. 
I am implementing SoM now, and seems like the best way is to make another mode like som-mode and make a new prompt for the mode.