serving icon indicating copy to clipboard operation
serving copied to clipboard

Tensorflow Serving doesn't support recent model from Tensorflow Hub

Open narkive opened this issue 3 years ago • 0 comments

Steps:

  1. Download st5-large
  2. Load it from a tensorflow-serving docker container, latest-gpu or nightly-gpu
  3. Send an inference request to the model
  4. Observe error log:
[evhttp_server.cc : 245] NET_LOG: Entering the event loop ...
2022-06-15 20:39:43.726834: W external/org_tensorflow/tensorflow/core/common_runtime/colocation_graph.cc:1146] Failed to place the graph without changing the devices of some resources. Some of the operations (that had to be colocated with resource generating operations) are not supported on the resources' devices. Current candidate devices are [
  /job:localhost/replica:0/task:0/device:CPU:0].
See below for details of this colocation group:
Colocation Debug Info:
Colocation group had the following types and supported devices:
Root Member(assigned_device_name_index_=-1 requested_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' assigned_device_name_='' resource_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' supported_device_types_=[CPU] possible_devices_=[]
SentencepieceTokenizeOp: CPU
SentencepieceOp: CPU

Colocation members, user-requested devices, and framework assigned devices, if any:
  StatefulPartitionedCall/StatefulPartitionedCall/model/st5/StatefulPartitionedCall/StatefulPartitionedCall/map/while/body/_801/map/while/SentenceTokenizerInitializer/SentencepieceOp (SentencepieceOp) /job:localhost/replica:0/task:0/device:GPU:0
  StatefulPartitionedCall/StatefulPartitionedCall/model/st5/StatefulPartitionedCall/StatefulPartitionedCall/map/while/body/_801/map/while/SentenceTokenizer/SentenceTokenizer/SentencepieceTokenizeOp (SentencepieceTokenizeOp) /job:localhost/replica:0/task:0/device:GPU:0

2022-06-15 20:39:43.726993: W external/org_tensorflow/tensorflow/core/common_runtime/colocation_graph.cc:1146] Failed to place the graph without changing the devices of some resources. Some of the operations (that had to be colocated with resource generating operations) are not supported on the resources' devices. Current candidate devices are [
  /job:localhost/replica:0/task:0/device:CPU:0].
See below for details of this colocation group:
Colocation Debug Info:
Colocation group had the following types and supported devices:
Root Member(assigned_device_name_index_=-1 requested_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' assigned_device_name_='' resource_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' supported_device_types_=[CPU] possible_devices_=[]
SentencepieceTokenizeOp: CPU
SentencepieceOp: CPU

Colocation members, user-requested devices, and framework assigned devices, if any:
  StatefulPartitionedCall/StatefulPartitionedCall/model/st5/StatefulPartitionedCall/StatefulPartitionedCall/map_1/while/body/_836/map_1/while/SentenceTokenizerInitializer/SentencepieceOp (SentencepieceOp) /job:localhost/replica:0/task:0/device:GPU:0
  StatefulPartitionedCall/StatefulPartitionedCall/model/st5/StatefulPartitionedCall/StatefulPartitionedCall/map_1/while/body/_836/map_1/while/SentenceTokenizer/SentenceTokenizer/SentencepieceTokenizeOp (SentencepieceTokenizeOp) /job:localhost/replica:0/task:0/device:GPU:0


2022-06-15 20:39:45.732795: W external/org_tensorflow/tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at xla_ops.cc:301 : Unimplemented: Could not find compiler for platform CUDA: Not found: could not find registered compiler for platform CUDA -- check target linkage (hint: try linking in tensorflow/compiler/jit:xla_gpu_jit)

I don't understand if SentencepieceOp is supposed to be running on the CPU and I'm "just" missing the CUDA compiler inside the tf-serving container (do I need to link-in aka recompile, or is this similar to the LD_PRELOAD fix??), or if the operation is supposed to be running on the GPU and it's a compatibility/version issue of some sort.

From what I understand previous models from google like USE use this operation, and Serving was updated over time to support them, but this error appear to be different even though it's related.

narkive avatar Jun 16 '22 06:06 narkive