TFServing 2.10.0 crash when trying to do inference with GPFlow model
Bug
When serving a GPFlow saved_model in TFServing 2.9+, the model server crashes
2022-09-19 09:50:19.941042: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:229] Restoring SavedModel bundle.
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
/usr/bin/tf_serving_entrypoint.sh: line 3: 7 Aborted (core dumped) tensorflow_model_server --port=8500 --rest_api_port=8501 --model_name=${MODEL_NAME} --model_base_path=${MODEL_BASE_PATH}/${MODEL_NAME} "$@"
This is related to the slice() operation inside the Kernel class. I just filled out the bug report also in the tensorflow/serving repository: https://github.com/tensorflow/serving/issues/2061
To reproduce
Minimal, reproducible example
This does not work:
import tensorflow as tf
import gpflow
k = gpflow.kernels.SquaredExponential(variance=30., lengthscales=[1., 2., 3., 4., 5.])
module = tf.Module()
module.k = k
module.predict = tf.function(
lambda x: k(x, x),
input_signature=[tf.TensorSpec(name='x', shape=(None,5), dtype=tf.float64)]
)
tf.saved_model.save(module, '<saved_model_location>', signatures={'predict' : module.predict})
While this one works fine:
import tensorflow as tf
import gpflow
k = gpflow.kernels.SquaredExponential(variance=30., lengthscales=[1., 2., 3., 4., 5.])
module = tf.Module()
module.k = k
module.predict = tf.function(
lambda x: k(x, x, presliced=True),
input_signature=[tf.TensorSpec(name='x', shape=(None,5), dtype=tf.float64)]
)
tf.saved_model.save(module, '<saved_model_location>', signatures={'predict' : module.predict})
The only difference is presliced=True that will skip the operation x[..., slice] that will crash the model server.
To test, I run the tensorflow serving instance with the following command:
docker run --rm --name mytfserving -t -p 9500:8500 -p 9501:8501 -v <my_saved_model_location>:/models tensorflow/serving:2.10.0 --model_config_file=/models/models.config
And with the following models.config:
model_config_list {
config {
name: 'mymodel'
base_path: '/models/mymodel'
model_platform: 'tensorflow'
model_version_policy {
all {}
}
}
}
Expected behavior
I would like to be able to serve the model on TFServing. Possibly rewriting the slice operation could work.
System information
- GPflow version: 2.5.2
- GPflow installed from: pypi
- TensorFlow version: 2.9.2
- Python version 3.8
- Operating system Windows
Additional context
It seems like a regression from TF 2.9, where they dropped support to this slice() operation (for TFServing). I've already created an issue on tensorflow/serving github page to see if this is the case or not. If so, just ignore this issue...
I'm sorry, but I have no experience with TFServing whatsoever so I don't know how I'd debug this. However GPflow is a Open Source project, and if you send me a PR with a fix I'd happily review it.