different Normalisation of an input image on mobile (Android) and computer (python)

Open ily-R opened this issue 3 years ago • 1 comments

I have an embedder model that takes an RGB image as input and outputs two vectors of shapes (batch_size,128) and (batch_size,512).

The model was trained with the following preprocessing of the image:

read the image into an RGB tensor of values in range [0, 255]
resize the image to (256, 256)
normalize the values to range [0, 1] by dividing by 255
apply mean/std normalisation : img = (img - mean)/std using mean = [0.45, 0.40, 0.35] and std=[0.28, 0.26, 0.25]

I converted the model to tflite and quantised it to uint8 using representative_dataset that follows the same preprocessing:

    converter = tf.lite.TFLiteConverter.from_saved_model(savedPath)
    converter.optimizations = [tf.lite.Optimize.DEFAULT]
    converter.representative_dataset = representative_data_gen
    converter.inference_input_type = tf.uint8
    converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
    tflite_model = converter.convert()
    open(file, "wb").write(tflite_model)

where

def representative_data_gen():
    for i, input_value in enumerate(Path("PathToCalibrationDataset").iterdir()):
        img = cv2.imread(str(input_value))
        img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
        img = cv2.resize(img, (256, 256))
        img = img / 255.
        img -= np.array([0.45, 0.40, 0.35])
        img /= np.array([0.28, 0.26, 0.25])
        img = img.astype(np.float32)
        img = img[tf.newaxis, :]
        yield [img]

Note: I Didnt add any metaData. I dont know if that will change a thing on mobile side ?

Python

Now I run the inference with an image that follows the same pre-processing as described above as follows:

    input = preprocess_image(image_path)
    interpreter = tf.lite.Interpreter(model_path=model_path)
    input_details = interpreter.get_input_details()
    output_details = interpreter.get_output_details()
    int8 = input_details[0]['dtype'] == np.uint8  
    if int8:
        scale, zero_point = input_details[0]['quantization']
        input = (input / scale + zero_point).astype(np.uint8) 
    interpreter.allocate_tensors()
    interpreter.set_tensor(input_details[0]['index'], input)
    interpreter.invoke()
    output1 = interpreter.get_tensor(output_details[0]['index'])
    output2 = interpreter.get_tensor(output_details[1]['index'])

I get the expected results compared to the original TF model with very small difference due to the quantization.

Android

I am wondering if this order of operations is correct ? I am getting a wrong output results!!

btw, I got these values 100f, 0.016160596f from python code input_details[0]['quantization'] and hard coded them in the Kotlin code to test.

val image = with(TensorImage()) {
    load(ContextCompat.getDrawable(context, R.drawable.crop)?.toBitmap()!!)
    val imageProcessor: ImageProcessor = ImageProcessor.Builder()
        .add(
            ResizeOp(
                MODEL_INPUT_IMAGE_SIZE,
                MODEL_INPUT_IMAGE_SIZE,
                ResizeOp.ResizeMethod.NEAREST_NEIGHBOR
            )
        )
        .add(DequantizeOp(100f, 0.016160596f))
        .add(
            NormalizeOp(
                floatArrayOf(
                    0.45f,
                    0.40f,
                    0.35f,
                ),
                floatArrayOf(
                    0.28f,
                    0.26f,
                    0.25f,
                ),
            )
        )
        .add(QuantizeOp(100f, 0.016160596f))
        .add(CastOp(DataType.UINT8))
        .build()
    imageProcessor.process(this)
}
val output1: FloatBuffer = FloatBuffer.allocate(OUTPUT1_SIZE)
val output2: FloatBuffer = FloatBuffer.allocate(OUTPUT2_SIZE)
val outputMap: MutableMap<Int, Any> = hashMapOf(
        0 to output1, 1 to output2
    )
val tfLite = Interpreter(model,Interpreter.Options())
val inputArray = arrayOf<Any>(image.buffer)
tfLite.runForMultipleInputsOutputs(inputArray, outputMap)

the closest output to that of python I got is when I remove all ops !!

val image = with(TensorImage()) {
    load(ContextCompat.getDrawable(context, R.drawable.crop)?.toBitmap()!!)
    val imageProcessor: ImageProcessor = ImageProcessor.Builder().build()
    imageProcessor.process(this)
}
val output1: FloatBuffer = FloatBuffer.allocate(OUTPUT1_SIZE)
val output2: FloatBuffer = FloatBuffer.allocate(OUTPUT2_SIZE)
val outputMap: MutableMap<Int, Any> = hashMapOf(
        0 to output1, 1 to output2
    )
val tfLite = Interpreter(model,Interpreter.Options())
val inputArray = arrayOf<Any>(image.buffer)
tfLite.runForMultipleInputsOutputs(inputArray, outputMap)

I am wondering what is going on under the hood, how can I use the mean/std values on Android to get exactly the same ouput. I know that interpreter has access to scale, zero_point but for sure it has no access to my mean/std values! so how can I fix this ?

Thank you

Jan 27 '23 18:01 ily-R

DequantizeOp / QuantizeOp are not needed in your case. In Android, you can combine your two normalization steps into one, such as from

3. normalize the values to range [0, 1] by dividing by 255
4. apply mean/std normalisation : img = (img - mean)/std using mean = [0.45, 0.40, 0.35] and std=[0.28, 0.26, 0.25]

apply mean/std normalisation : img = (img - mean)/std using mean = [0.45x255, 0.40x255, 0.35x255] and std=[0.28*255, 0.26*255, 0.25*255]

And then feed the updated mean and std arrays to NormalizeOp.

Aug 08 '23 00:08 lu-wang-g