Groma icon indicating copy to clipboard operation
Groma copied to clipboard

what user_query can i use?

Open xiaoyazhu opened this issue 1 year ago • 5 comments

If i want to locate a specific target ,such as "a person wearing a yellow hat", what user_query can i use in inference?

python -m groma.eval.run_groma \
    --model-name {path_to_groma_7b_finetune} \
    --image-file {path_to_img} \
    --query {user_query} \
    --quant_type 'none' # support ['none', 'fp16', '8bit', '4bit'] for inference

xiaoyazhu avatar Aug 07 '24 03:08 xiaoyazhu

You can simply use Locate <p> a person wearing a yellow hat </p> in the image. or something else like that. Just remember to enclose the referring expression with <p> and </p>.

machuofan avatar Aug 07 '24 04:08 machuofan

But when i tested several images, i found that whether there is an object in the test image, the model will output a localization result. Is there any way to adjust the settings , or is it caused by model hallucination ?

xiaoyazhu avatar Aug 07 '24 06:08 xiaoyazhu

Yes, such hallucination is probably caused by training data - for grounding training, we only got positive QA pairs, i.e., the object mentioned in the question is guaranteed to occur in the image. To remedy such hallucination, you can curate some negative QA pairs and finetune the model.

machuofan avatar Aug 20 '24 12:08 machuofan

If I need to fine-tune the model for locating target elements in an image, is there certain hyper parameters to focus on, please?

Eman-Abdelrahman avatar Aug 28 '24 19:08 Eman-Abdelrahman

If I want to provide the object's label and bounding box to Groma to help generate more accurate image descriptions, how should I structure the prompt?

LLH-Harward avatar Sep 15 '24 03:09 LLH-Harward