serving icon indicating copy to clipboard operation
serving copied to clipboard

remote Docker or invalid path hangs while following quickstart

Open alfredodeza opened this issue 3 years ago • 2 comments

Bug Report

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): OSX (M1) using Docker for Desktop client, and OSX (Intel) on the remote side
  • TensorFlow Serving installed from (source or binary): Using docker image
  • TensorFlow Serving version: Can't tell, used the latest tag for the container

Describe the problem

While following the quickstart guide, I had an invalid path for the volume which caused the container to hang and not allow me to cancel the process.

This problem will happen anytime that an incorrect path is used.

I forgot that I'm using a remote Docker instance. This means that the local path must exist in the remote server when binding volumes. In my case, this path wasn't on the remote server and caused the serve to hang (couldn't Ctrl-C out of it):

$ docker run -t --rm -p 8501:8501 -v "$TESTDATA/saved_model_half_plus_two_cpu:/models/half_plus_two" -e MODEL_NAME=half_plu
s_two tensorflow/serving
2022-05-18 14:29:55.887915: I tensorflow_serving/model_servers/server.cc:89] Building single TensorFlow model file config:  model_name: half_plus_two model_base_path: /models/half_plus_two
2022-05-18 14:29:55.888309: I tensorflow_serving/model_servers/server_core.cc:465] Adding/updating models.
2022-05-18 14:29:55.888383: I tensorflow_serving/model_servers/server_core.cc:591]  (Re-)adding model: half_plus_two
2022-05-18 14:29:55.891480: W tensorflow_serving/sources/storage_path/file_system_storage_path_source.cc:268] No versions of servable half_plus_two found under base path /models/half_plus_two. Did you forget to name your leaf directory as a number (eg. '/1/')?
2022-05-18 14:29:56.901813: W tensorflow_serving/sources/storage_path/file_system_storage_path_source.cc:268] No versions of servable half_plus_two found under base path /models/half_plus_two. Did you forget to name your leaf directory as a number (eg. '/1/')?
2022-05-18 14:29:57.904190: W tensorflow_serving/sources/storage_path/file_system_storage_path_source.cc:268] No versions of servable half_plus_two found under base path /models/half_plus_two. Did you forget to name your leaf directory as a number (eg. '/1/')?
2022-05-18 14:29:58.908832: W tensorflow_serving/sources/storage_path/file_system_storage_path_source.cc:268] No versions of servable half_plus_two found under base path /models/half_plus_two. Did you forget to name your leaf directory as a number (eg. '/1/')?

Since it is impossible to cancel the operation, I had to find the PID and issue a kill -9 to terminate the process.

Exact Steps to Reproduce

Although I got into this issue with a remote Docker instance, it is reproducible by passing an invalid path to the volume to get into this problem. For example, missing the absolute path to it:

$ docker run -t --rm -p 8501:8501 -v "saved_model_half_plus_two_cpu:/models/half_plus_two" -e MODEL_NAME=half_plus_two tensorflow/serving

Source code / logs

Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached. Try to provide a reproducible test case that is the bare minimum necessary to generate the problem.

alfredodeza avatar May 18 '22 14:05 alfredodeza

Hi @alfredodeza

Can you please try the docker command in the following format and try adding a / before $ in Git bash.

docker run -t --rm -p 8501:8501
-v /$TESTDATA/my_model:/models/my_model
-e MODEL_NAME=my_model
tensorflow/serving &

pindinagesh avatar May 24 '22 01:05 pindinagesh

Thanks for the reply @pindinagesh . By using the ampersand the command goes into the background, but the problem persists. The issue here is that the service doesn't tolerate an invalid path and it is impossible to exit out of it.

alfredodeza avatar May 24 '22 16:05 alfredodeza

@alfredodeza, @yalcinaa,

Setting servable_versions_always_present param to True will allow TF Serving to fail if wrong model or model path is provided to the model. Once model server fails, you can start the model server again by providing the correct model.

Let us know if this helps. Thank you!

singhniraj08 avatar Oct 20 '23 05:10 singhniraj08

@singhniraj08 No such parameter exists in tensorflow_model_server command. I can't change it.

yalcinaa avatar Oct 20 '23 08:10 yalcinaa

@yalcinaa, we have to add the servable_versions_always_present: true param in model_config_list and pass it to model server. Thanks

singhniraj08 avatar Oct 20 '23 10:10 singhniraj08

This issue has been marked stale because it has no recent activity since 7 days. It will be closed if no further activity occurs. Thank you.

github-actions[bot] avatar Oct 28 '23 01:10 github-actions[bot]

@singhniraj08 the problem with that approach is that it assumes that the developer will understand what that parameter does and that it is relevant to an incorrect path. This issue should address the fact that it is possible to pass an incorrect path, and provide an error that explains what the path is where things were looked at and how or why the expected model couldn't be loaded - in this case, because it couldn't be found.

Asking to pass a parameter doesn't seem like the right approach here.

alfredodeza avatar Oct 31 '23 14:10 alfredodeza

@alfredodeza, This sounds like a feature we need to work on to implement. Let me bring this up to the team internally. In meanwhile, Please feel free to create a PR if you are interested in the feature implementation. Thank you!

singhniraj08 avatar Nov 01 '23 03:11 singhniraj08

This issue has been marked stale because it has no recent activity since 7 days. It will be closed if no further activity occurs. Thank you.

github-actions[bot] avatar Nov 09 '23 01:11 github-actions[bot]

This issue was closed due to lack of activity after being marked stale for past 7 days.

github-actions[bot] avatar Nov 16 '23 01:11 github-actions[bot]

Are you satisfied with the resolution of your issue? Yes No

google-ml-butler[bot] avatar Nov 16 '23 01:11 google-ml-butler[bot]