different Normalisation of an input image on mobile (Android) and computer (python)
I have an embedder model that takes an RGB image as input and outputs two vectors of shapes (batch_size,128) and (batch_size,512).
The model was trained with the following preprocessing of the image:
- read the image into an RGB tensor of values in range [0, 255]
- resize the image to
(256, 256) - normalize the values to range [0, 1] by dividing by 255
- apply mean/std normalisation :
img = (img - mean)/stdusing mean = [0.45, 0.40, 0.35] and std=[0.28, 0.26, 0.25]
I converted the model to tflite and quantised it to uint8 using representative_dataset that follows the same preprocessing:
converter = tf.lite.TFLiteConverter.from_saved_model(savedPath)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_data_gen
converter.inference_input_type = tf.uint8
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
tflite_model = converter.convert()
open(file, "wb").write(tflite_model)
where
def representative_data_gen():
for i, input_value in enumerate(Path("PathToCalibrationDataset").iterdir()):
img = cv2.imread(str(input_value))
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
img = cv2.resize(img, (256, 256))
img = img / 255.
img -= np.array([0.45, 0.40, 0.35])
img /= np.array([0.28, 0.26, 0.25])
img = img.astype(np.float32)
img = img[tf.newaxis, :]
yield [img]
Note: I Didnt add any metaData. I dont know if that will change a thing on mobile side ?
Python
Now I run the inference with an image that follows the same pre-processing as described above as follows:
input = preprocess_image(image_path)
interpreter = tf.lite.Interpreter(model_path=model_path)
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
int8 = input_details[0]['dtype'] == np.uint8
if int8:
scale, zero_point = input_details[0]['quantization']
input = (input / scale + zero_point).astype(np.uint8)
interpreter.allocate_tensors()
interpreter.set_tensor(input_details[0]['index'], input)
interpreter.invoke()
output1 = interpreter.get_tensor(output_details[0]['index'])
output2 = interpreter.get_tensor(output_details[1]['index'])
I get the expected results compared to the original TF model with very small difference due to the quantization.
Android
I am wondering if this order of operations is correct ? I am getting a wrong output results!!
btw, I got these values 100f, 0.016160596f from python code input_details[0]['quantization'] and hard coded them in the Kotlin code to test.
val image = with(TensorImage()) {
load(ContextCompat.getDrawable(context, R.drawable.crop)?.toBitmap()!!)
val imageProcessor: ImageProcessor = ImageProcessor.Builder()
.add(
ResizeOp(
MODEL_INPUT_IMAGE_SIZE,
MODEL_INPUT_IMAGE_SIZE,
ResizeOp.ResizeMethod.NEAREST_NEIGHBOR
)
)
.add(DequantizeOp(100f, 0.016160596f))
.add(
NormalizeOp(
floatArrayOf(
0.45f,
0.40f,
0.35f,
),
floatArrayOf(
0.28f,
0.26f,
0.25f,
),
)
)
.add(QuantizeOp(100f, 0.016160596f))
.add(CastOp(DataType.UINT8))
.build()
imageProcessor.process(this)
}
val output1: FloatBuffer = FloatBuffer.allocate(OUTPUT1_SIZE)
val output2: FloatBuffer = FloatBuffer.allocate(OUTPUT2_SIZE)
val outputMap: MutableMap<Int, Any> = hashMapOf(
0 to output1, 1 to output2
)
val tfLite = Interpreter(model,Interpreter.Options())
val inputArray = arrayOf<Any>(image.buffer)
tfLite.runForMultipleInputsOutputs(inputArray, outputMap)
the closest output to that of python I got is when I remove all ops !!
val image = with(TensorImage()) {
load(ContextCompat.getDrawable(context, R.drawable.crop)?.toBitmap()!!)
val imageProcessor: ImageProcessor = ImageProcessor.Builder().build()
imageProcessor.process(this)
}
val output1: FloatBuffer = FloatBuffer.allocate(OUTPUT1_SIZE)
val output2: FloatBuffer = FloatBuffer.allocate(OUTPUT2_SIZE)
val outputMap: MutableMap<Int, Any> = hashMapOf(
0 to output1, 1 to output2
)
val tfLite = Interpreter(model,Interpreter.Options())
val inputArray = arrayOf<Any>(image.buffer)
tfLite.runForMultipleInputsOutputs(inputArray, outputMap)
I am wondering what is going on under the hood, how can I use the mean/std values on Android to get exactly the same ouput. I know that interpreter has access to scale, zero_point but for sure it has no access to my mean/std values! so how can I fix this ?
Thank you
DequantizeOp / QuantizeOp are not needed in your case. In Android, you can combine your two normalization steps into one, such as from
3. normalize the values to range [0, 1] by dividing by 255
4. apply mean/std normalisation : img = (img - mean)/std using mean = [0.45, 0.40, 0.35] and std=[0.28, 0.26, 0.25]
to
apply mean/std normalisation : img = (img - mean)/std using mean = [0.45x255, 0.40x255, 0.35x255] and std=[0.28*255, 0.26*255, 0.25*255]
And then feed the updated mean and std arrays to NormalizeOp.