Daisuke134 comments

Results 13 comments of


                                            Daisuke134

Object detection

@KBB99 Thank you! I was trying to implement SoM, but the accuracy of labels for screenshots were quite low, so I was trying to look into other ways of solving...

Object detection

@KBB99 Looks great! I was looking into your code, and here are some thoughts I had. ・I think you are using summary_prompt to make GPT-4 respond with the label for...

Object detection

@KBB99 Hi. I was trying to test out your code, but had some issues installing lang_sam. I am using Python 3.12 and tried to install torch and torchvision but was...

Object detection

@joshbickett @KBB99 Thank you. I could run the code using 3.9.18. However, when I did operate and "Go to youtube.com and play some holiday music", there is an error saying...

> run `pip install torchvision ` separate before `lang-segment-anything` Meaning doing "pip install torchvision" before "pip install -U git+https://github.com/luca-medeiros/lang-segment-anything.git"? Also, I am still having the error with parsing JSON.. I...

Add scrolling support and replace CLICK action

@michaelhhogue Thank you so much. I will check out the scrolling action🙇‍♂️

Add a grid of coordinates

How about applying a dynamic grid approach to enhance click accuracy? For example, we could adjust the grid density based on the proximity to the cursor. The areas closer to...

Integrate Set-of-Mark Visual Prompting for GPT-4V

I am trying to implement SoM, since it seems to have the best accuracy.

Integrate Set-of-Mark Visual Prompting for GPT-4V

I have been testing out SoM and seems pretty good. Here is the screenshot.. I will try adding this today, test it, and make PR. ![image](https://github.com/OthersideAI/self-operating-computer/assets/140220114/855b63ca-82d9-4231-9849-a0359bf5895c)

Integrate Set-of-Mark Visual Prompting for GPT-4V

I am implementing SoM now, and seems like the best way is to make another mode like som-mode and make a new prompt for the mode.