serving TF Serving for mask rcnn too slow

I also have the same low-performance issue. I guess it mainly comes from two parts:

It takes time to convert the image into JSON payload and POST.
TF serving itself is delayed (posts have been made several times in advance as a warm-up).

Therefore, the result of my POST test on the remote side and the local side is that the remote side (MBP + WIFI) takes 16 ~ 20 seconds to print res.josn. The local side takes 5 ~ 7 seconds. Also, I observed GPU usage, and it only ran (~70%) for less than a second during the entire POST.

# 1024x1024x3 image to json ans POST
image = PIL.Image.open(sys.argv[1])
payload = {"inputs": [image_np.tolist()]}
res = requests.request("POST", "http://2444.333.222.111:8501/v1/models/maskrcnn:predict", data=json.dumps(payload))
print(res.json())

Apr 06 '22 05:04 vscv

I'm not sure this slowness really arises from the TF serving itself. From the description of the test results, it sounds like the real bottleneck is preparation and transportation of the payload data, not the computation of the model. Perhaps it implies that the data is large for the model's complexity.

I would consider some pre-processing to reduce the payload size and/or use faster languages (e.g. C/C++) for payload prep.

Apr 15 '22 21:04 godot73

thanks, @godot73

After a comparison test, using the same model and input image, the API built by Flask can achieve a response time of less than one second when using POST "file=open_img_file" to transmit a 1024x1024 image remotely. Maybe the time difference really comes from data=json.dumps(payload). But TF serving only allows josn delivery, which seems to be a dead end.

Apr 18 '22 01:04 vscv

Hi, @vscv

Apologies for the delay and TF serving supports REST and gRPC and if you're looking for low latency and better throughput then you'll have to do Batching Configuration with below parameters and you can play with those parameters values, you can refer official documentation of TensorFlow Serving Batching Guide and gRPC is more network efficient, smaller payloads and it can provide much faster inferences as compared to REST, you can refer these articles for TF serving with gRPC [1],[2]

Example batching parameters file:

max_batch_size { value: 128 }
batch_timeout_micros { value: 0 }
max_enqueued_batches { value: 1000000 }
num_batch_threads { value: 8 }
--enable_batching=true

Could you please try above workaround and confirm if this issue is resolved for you ? Please feel free to close the issue if it is resolved ?

If issue still persists, please let us know In order to expedite the trouble-shooting process, please provide a code snippet to reproduce the issue reported here

Thank you!

Dec 21 '22 20:12 gaikwadrahul8