bumblebee icon indicating copy to clipboard operation
bumblebee copied to clipboard

Document image format expectations for Bumblebee.Vision.ImageClassification

Open kipcole9 opened this issue 3 years ago • 3 comments

I would like to contribute some documentation that clarifies the expected image format to Bumblebee.Vision.image_classification. The type t:Bumblebee.Vision.image says:

@type image() :: Nx.Container.t() A term representing an image. Either Nx.Tensor in HWC order or a struct implementing Nx.Container and resolving to such tensor.

However it does not clarify:

  • If the image should be resized first to the same size as that used to train the model (224 x 224 for the resnet models?)
  • Whether the image data should be {:u, 8} or some other type (some models suggest data should be in the range [0.0..1.0]
  • Whether the image can have an alpha layer (reading the code suggests yes, but perhaps that is model dependent)
  • Whether the image should be preprocessed? This stack overflow article suggests they should be?

If I can get some guidance I'll write a doc PR.

kipcole9 avatar Dec 12 '22 13:12 kipcole9

Hey Kip! The image doesn't need to be particularly normalized, because it first goes through a featurizer. In other words, it's not the direct model input, but plain image as pixels. In fact, the type is Nx.Container.t(), because it may also be a struct that implements Nx.Container, which we already do for StbImage (ref).

A featurizer usually casts to float, resizes, scales into [0.0, 1.0]. Whether an alpha layer is used is usually up to the model configuration. So I think the only generally applicable expectation is that the image values are 0..255.

A PR improving the docs is welcome!

jonatanklosko avatar Dec 13 '22 01:12 jonatanklosko

@jonatanklosko, TIL what a featurizer is! I suppose the assumption is also the the channel order is RGB (not BGR). I'll work on a doc PR this weekend. For validation then, the input image has the following assumed characteristics:

  • HWC order
  • RGB color (channel order, not CMYK or some other color space)
  • Alpha channel support is model specific
  • {:u, 8} or {:f, 32} or {:f, 64} data type

Thanks for the continuing education and the great library.

kipcole9 avatar Dec 13 '22 03:12 kipcole9

The type is not as strict, pretty much any :u or :s type would do. Other than that sounds good!

jonatanklosko avatar Dec 13 '22 11:12 jonatanklosko