rai icon indicating copy to clipboard operation
rai copied to clipboard

Grounded-sam-2 versus Florence-2 + sam2

Open Juliaj opened this issue 2 months ago • 0 comments

RAI currently depends on Grounded-sam-2 which is a combination of two models:

  • Grounding DINO, downloaded weights (661.85 MB)
  • SAM2, sam2_hiera_large.pt, downloaded weights (856.35 MB)

A close equivalent to Grounded-sam-2 is Florence-2 + sam2. Florence-2 comes in two variants:

  • Florence-2-base: https://huggingface.co/microsoft/Florence-2-base (less performant, lower accuracy)
  • Florence-2-large: https://huggingface.co/microsoft/Florence-2-large, ~1.54 GB(?)

One advantage of Florence-2 is that it's a unified model, while Grounded-SAM-2 requires running two separate models in sequence (Grounding DINO for detection, then SAM 2 for segmentation). I'm curious on whether we'd considered the Florence-2 in the past and ruled it out.

Juliaj avatar Nov 30 '25 00:11 Juliaj