TFServing 2.10.0 crash when trying to do inference with GPFlow model

Open battuzz opened this issue 3 years ago • 1 comments

Bug

When serving a GPFlow saved_model in TFServing 2.9+, the model server crashes

2022-09-19 09:50:19.941042: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:229] Restoring SavedModel bundle.
terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc
/usr/bin/tf_serving_entrypoint.sh: line 3:     7 Aborted                 (core dumped) tensorflow_model_server --port=8500 --rest_api_port=8501 --model_name=${MODEL_NAME} --model_base_path=${MODEL_BASE_PATH}/${MODEL_NAME} "$@"

This is related to the slice() operation inside the Kernel class. I just filled out the bug report also in the tensorflow/serving repository: https://github.com/tensorflow/serving/issues/2061

To reproduce

Minimal, reproducible example

This does not work:

import tensorflow as tf
import gpflow

k = gpflow.kernels.SquaredExponential(variance=30., lengthscales=[1., 2., 3., 4., 5.])

module = tf.Module()
module.k = k
module.predict = tf.function(
    lambda x: k(x, x), 
    input_signature=[tf.TensorSpec(name='x', shape=(None,5), dtype=tf.float64)]
)

tf.saved_model.save(module, '<saved_model_location>', signatures={'predict' : module.predict})

While this one works fine:

import tensorflow as tf
import gpflow

k = gpflow.kernels.SquaredExponential(variance=30., lengthscales=[1., 2., 3., 4., 5.])

module = tf.Module()
module.k = k
module.predict = tf.function(
    lambda x: k(x, x, presliced=True), 
    input_signature=[tf.TensorSpec(name='x', shape=(None,5), dtype=tf.float64)]
)

tf.saved_model.save(module, '<saved_model_location>', signatures={'predict' : module.predict})

The only difference is presliced=True that will skip the operation x[..., slice] that will crash the model server.

To test, I run the tensorflow serving instance with the following command:

docker run --rm --name mytfserving -t  -p 9500:8500 -p 9501:8501 -v <my_saved_model_location>:/models tensorflow/serving:2.10.0 --model_config_file=/models/models.config

And with the following models.config:

model_config_list {
  config {
    name: 'mymodel'
    base_path: '/models/mymodel'
    model_platform: 'tensorflow'
    model_version_policy {
      all {}
    }
  }
}

Expected behavior

I would like to be able to serve the model on TFServing. Possibly rewriting the slice operation could work.

System information

GPflow version: 2.5.2
GPflow installed from: pypi
TensorFlow version: 2.9.2
Python version 3.8
Operating system Windows

Additional context

It seems like a regression from TF 2.9, where they dropped support to this slice() operation (for TFServing). I've already created an issue on tensorflow/serving github page to see if this is the case or not. If so, just ignore this issue...

Sep 19 '22 14:09 battuzz

I'm sorry, but I have no experience with TFServing whatsoever so I don't know how I'd debug this. However GPflow is a Open Source project, and if you send me a PR with a fix I'd happily review it.

Sep 23 '22 15:09 jesnie