serving file_system_poll_wait_seconds default (1s) is very expensive for remote (GS://) model paths

We just realized that we were spending quite a bit of money on merely polling for version updates for our models which are hosted in GCS.

It was doing a Class A operation (list objects) once per second due to https://github.com/tensorflow/serving/blob/6b1a02b5fc63def9b4cfd75bd9dbce9bed4c10bb/tensorflow_serving/model_servers/server.h#L68.

We've solved our cost issue just by decreasing that to 1 minute, but perhaps there could be a warning message or a different default for remote storage model paths that could incur a cost if it's left at the default.

Sep 24 '21 19:09 pselden

This indeed can be quite a problem. Would it be possible to have an API to refresh a particular served model? In this way, one would not need to poll for newer model versions and can be part of a deployment pipeline.

Oct 19 '21 08:10 kalosisz

@pselden,

You can issue HandleReloadConfigRequest RPC calls to the server and supplying a new Model Server config programmatically to load the updated model.

Thank you!

Nov 22 '22 10:11 singhniraj08

Closing this due to inactivity. Please take a look into the answers provided above, feel free to reopen and post your comments(if you still have queries on this). Thank you!

Jan 27 '23 15:01 singhniraj08