OpenAdapt icon indicating copy to clipboard operation
OpenAdapt copied to clipboard

Simplify image description in `visual.py` and `segment.py`

Open abrichr opened this issue 1 year ago • 0 comments

Feature request

In VisualReplayStrategy and SegmentReplayStrategy, segment description is currently formulated as:

image -> masks -> masked_images -> masked_image_descriptions = prompt("describe these images") -> active_segment_description (for mouse events only) -> prompt("given <masked_image_descriptions>,<active_segment_description>, ...: generate the next action") -> modified_active_segment_description -> modified_segment_coordinates

A simpler version worth trying:

image -> masks -> masked_images -> masked_image_descriptions = prompt("describe these images") -> active_segment_description (for mouse events only) -> prompt("given <masked_image_descriptions>,<active_segment_description>, and their coordinates,...: generate the next action")

i.e. have the model return coordinates, given segment descriptions

Motivation

Simplify, leverage future model performance improvements

abrichr avatar Jun 06 '24 21:06 abrichr