serving Unexpectedly high CPU utilization of Docker container; even without models.

System Information

Host OS: Microsoft Windows 10 Pro (Version 10.0.19041 Build 19041)
TensorFlow Serving installed from Docker: tensorflow/serving:latest-devel and Ubuntu:18.04
Docker Desktop: 2.3.0.4 (46911) (using Linux containers)

The Problem

TensorFlow Servings docker container uses significantly more CPU than expected/desirable.

Evaluation

docker stats shows the following:

CONTAINER ID        NAME                                            CPU %               MEM USAGE / LIMIT     MEM %               NET I/O             BLOCK I/O           PIDS
0e5fca607055        linguistic-intelligence_tensorflow-servings_1   14.75%              24.26MiB / 12.43GiB   0.19%               1.19kB / 0B         0B / 0B             29

Notice the container is reported as using 14.75% CPU. This is after running for 15 minutes.

And here's the issue: This is with zero models loaded. Here, the sustained CPU utilization is around 6-17%. (If models had been loaded the sustained CPU utilization is around 25%.)

Expected Behaviour:

Of course, since TSF has been delegated zero work to do, it's expected that CPU utilization would be minimal.

Exact Steps to Reproduce

Running the Docker Container:

docker-compose up tensorflow-servings

docker-compose.yaml:

  tensorflow-servings:
    build:
      context: .
      dockerfile: tensorflow-servings.dockerfile
    image: tensorflow-servings
    ports:
      - 8500:8500
      - 8501:8501
    volumes:
      - ./ml-models/:/models/:ro
      - ./ml-servings-config/:/config/:ro
    deploy:
      resources:
        limits:
          memory: 4Gb
    environment:
      - MODEL_CONFIG_FILE_POLL_WAIT_SECONDS=0

tensorflow-servings.dockerfile:

ARG TF_SERVING_VERSION=latest
ARG TF_SERVING_BUILD_IMAGE=tensorflow/serving:${TF_SERVING_VERSION}-devel

FROM ${TF_SERVING_BUILD_IMAGE} as build_image
FROM ubuntu:18.04

ARG TF_SERVING_VERSION_GIT_BRANCH=master
ARG TF_SERVING_VERSION_GIT_COMMIT=head

LABEL tensorflow_serving_github_branchtag=${TF_SERVING_VERSION_GIT_BRANCH}
LABEL tensorflow_serving_github_commit=${TF_SERVING_VERSION_GIT_COMMIT}

RUN apt-get update && apt-get install -y --no-install-recommends \
        ca-certificates \
        && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

# Install TF Serving pkg
COPY --from=build_image /usr/local/bin/tensorflow_model_server /usr/bin/tensorflow_model_server

# PORTS:
# gRPC
EXPOSE 8500
# REST
EXPOSE 8501

# Set where models should be stored in the container
ENV MODEL_BASE_PATH=/models/

# Embed the models
COPY ./ml-models /models
COPY ./ml-servings-config /config

# Create a script that runs the model server so we can use environment variables
# while also passing in arguments from the docker command line
RUN echo '#!/bin/bash \n\n\
tensorflow_model_server \
--allow_version_labels_for_unavailable_models=true \
--batching_parameters_file=/config/batching_parameters.txt \
--enable_batching=true \
--model_base_path=${MODEL_BASE_PATH} \
--model_config_file=/config/all_models.config \
--model_config_file_poll_wait_seconds=60 \
--monitoring_config_file=/config/monitoring_config.txt \
--port=8500 \
--rest_api_port=8501 \
--rest_api_timeout_in_ms=30000 \
"$@"' > /usr/bin/tf_serving_entrypoint.sh \
&& chmod +x /usr/bin/tf_serving_entrypoint.sh

ENTRYPOINT ["/usr/bin/tf_serving_entrypoint.sh"]

all_models.config:

model_config_list {

}

Logs: When starting the container:

$ docker-compose up tensorflow-servings
WARNING: Some services (natural-language-server, tensorflow-servings) use the 'deploy' key, which will be ignored. Compose does not support 'deploy' configuration - use `docker stack deploy` to deploy to a swarm.
Creating linguistic-intelligence_tensorflow-servings_1 ... done
Attaching to linguistic-intelligence_tensorflow-servings_1
tensorflow-servings_1      | 2020-09-07 09:27:05.197858: I tensorflow_serving/model_servers/server_core.cc:464] Adding/updating models.
tensorflow-servings_1      | 2020-09-07 09:27:05.204117: I tensorflow_serving/model_servers/server.cc:367] Running gRPC ModelServer at 0.0.0.0:8500 ...
tensorflow-servings_1      | 2020-09-07 09:27:05.207537: I tensorflow_serving/model_servers/server_core.cc:464] Adding/updating models.
tensorflow-servings_1      | [warn] getaddrinfo: address family for nodename not supported
tensorflow-servings_1      | [evhttp_server.cc : 238] NET_LOG: Entering the event loop ...
tensorflow-servings_1      | 2020-09-07 09:27:05.215164: I tensorflow_serving/model_servers/server.cc:387] Exporting HTTP/REST API at:localhost:8501 ...
tensorflow-servings_1      | 2020-09-07 09:28:05.205505: I tensorflow_serving/model_servers/server_core.cc:464] Adding/updating models.
tensorflow-servings_1      | 2020-09-07 09:29:05.206871: I tensorflow_serving/model_servers/server_core.cc:464] Adding/updating models.
tensorflow-servings_1      | 2020-09-07 09:30:05.206192: I tensorflow_serving/model_servers/server_core.cc:464] Adding/updating models.
tensorflow-servings_1      | 2020-09-07 09:31:05.206607: I tensorflow_serving/model_servers/server_core.cc:464] Adding/updating models.
tensorflow-servings_1      | 2020-09-07 09:32:05.207374: I tensorflow_serving/model_servers/server_core.cc:464] Adding/updating models.
tensorflow-servings_1      | 2020-09-07 09:33:05.207560: I tensorflow_serving/model_servers/server_core.cc:464] Adding/updating models.
tensorflow-servings_1      | 2020-09-07 09:34:05.207728: I tensorflow_serving/model_servers/server_core.cc:464] Adding/updating models.
tensorflow-servings_1      | 2020-09-07 09:35:05.206796: I tensorflow_serving/model_servers/server_core.cc:464] Adding/updating models.
tensorflow-servings_1      | 2020-09-07 09:36:05.213456: I tensorflow_serving/model_servers/server_core.cc:464] Adding/updating models.
tensorflow-servings_1      | 2020-09-07 09:37:05.207654: I tensorflow_serving/model_servers/server_core.cc:464] Adding/updating models.
tensorflow-servings_1      | 2020-09-07 09:38:05.208601: I tensorflow_serving/model_servers/server_core.cc:464] Adding/updating models.
tensorflow-servings_1      | 2020-09-07 09:39:05.208775: I tensorflow_serving/model_servers/server_core.cc:464] Adding/updating models.
tensorflow-servings_1      | 2020-09-07 09:40:05.209190: I tensorflow_serving/model_servers/server_core.cc:464] Adding/updating models.
tensorflow-servings_1      | 2020-09-07 09:41:05.209411: I tensorflow_serving/model_servers/server_core.cc:464] Adding/updating models.
tensorflow-servings_1      | 2020-09-07 09:42:05.210525: I tensorflow_serving/model_servers/server_core.cc:464] Adding/updating models.
tensorflow-servings_1      | 2020-09-07 09:43:05.210102: I tensorflow_serving/model_servers/server_core.cc:464] Adding/updating models.
tensorflow-servings_1      | 2020-09-07 09:44:05.210034: I tensorflow_serving/model_servers/server_core.cc:464] Adding/updating models.

batching_parameters.txt:

max_batch_size { value: 64 }
batch_timeout_micros { value: 0 }
num_batch_threads { value: 4 }
pad_variable_length_inputs: true

Thanks for your attention.

Sep 07 '20 10:09 ericmclachlan

It seems there is a process keep trying to "Adding/updating models", which causing the ~20% CPU usage.

Sep 28 '20 02:09 guanxinq

Try setting --file_system_poll_wait_seconds=0

The default is set to 1 second - I think that's why there is an adding/updating models every 1 second - tf serving is checking /models for changes

Sep 28 '20 14:09 dchall88

If what I suggested is the cause of your problem I don't know if I would describe it as a bug nor a workaround - tf-serving is behaving as described - the default is for it to poll the filesystem where you have indicated your models are located, every second, to see if a new version of any of the models have been added. If you don't want it to poll the filesystem every second you have to change the default to the setting that works for your use case.

It also wouldn't effect how much a machine is costing, you would still be charged for the machine whether the cpu was being used or not. A caveat to that is if your models are not located on local storage but on s3 or google storage, the repeated polling of the cloud storage will incur costs.

Sep 29 '20 10:09 dchall88

@dchall88: Yeah, I see that you're right about usage not affecting pricing. I've deleted the comment since it contained misinformation.

To reply to your suggestion,

Try setting --file_system_poll_wait_seconds=0

The default is set to 1 second - I think that's why there is an adding/updating models every 1 second - tf serving is checking /models for changes

I believe the "adding/updating models" message is being reported every 1 minute rather than 1 second. I think it's maybe the config polling interval that is triggering the output you mentioned.

Regarding your suggestion to set --file_system_poll_wait_seconds=0: I implemented the change but I still see non-trivial activity on the container: Around 14.5% CPU; even after 30 minutes.

Following a hunch, I disabled batching and now I see the CPU is consuming 0.05%; a much more reasonable dormant state. So, there's something related to batching that appears to be causing the unexpected CPU usage.

I've now added my batching_parameters.txt file to the main issue description.

Sep 29 '20 12:09 ericmclachlan

Eric is right. The unexpected high CPU utilization is caused by the batching configuration. The four threads will keep busy waiting for requests and thus consume resources. The CPU usage and the num_batch_threads are proportional.

Oct 03 '20 00:10 guanxinq

In most cases, tensorflow_model_server will run with models specified/loaded. Eric, is running the tensorflow_model_server without models a meaningful use case for you? Also is the high CPU utilization a big concern? Essentially, the issue could be mitigated if models are loaded and batch queries are sent.

Oct 06 '20 16:10 guanxinq

Hi @guanxinq. The only reason I'm not using models here is to demonstrate that the issue is easily reproducible. I observed high CPU usage with models, which led me to try and create a minimalist reproducible example. It just so happened to be that I didn't need a model to demonstrate the behavior.

It seems you are suggesting that having models would reduce CPU usage? This is contrary to my observation.

Oct 07 '20 05:10 ericmclachlan

Sorry for the confusion. Adding models would not help reduce the utilization. I meant sending batched traffic to models and therefore the 4 threads you configured could be made use of.

Oct 07 '20 16:10 guanxinq

Hi @guanxinq. My understanding of the issue is that each thread used for managing the batching of requests will use CPU cycles. If the number of threads is increased, so too is the CPU usage. If I implemented the batch scheduling myself, I'm pretty sure I could implement it so that there is negligible CPU usage until requests are received. This is an apparent inefficiency. Whether that inefficiency is important enough to fix is not for me to decide.

I don't understand your suggestion to send requests. Of course, sending requests is my normal use case. Sending requests will raise CPU usage and make this problem more difficult to detect. This doesn't change the inefficiency in any way. At least, not according to my understanding.

If TensorFlow was my product, I would fix the issue because this inefficiency will likely affect every machine that uses batching. Resolving the issue would, similarly, improve the performance of every machine that uses batching. I'd imagine that would make a lot of customers happy.

Apologies if I didn't understand your suggestion correctly.

Oct 08 '20 06:10 ericmclachlan

Hi @ericmclachlan, sorry for the inconvenience. We have an internal task tracking this resource inefficiency issue, will keep you posted when it gets fixed. Meanwhile, feel free to contribute the repo directly if you have any ideas about the batch scheduler.

Oct 13 '20 05:10 guanxinq

Thank you, @guanxinq. I appreciate your time and attention.

Wishing you the best of luck.

Oct 13 '20 10:10 ericmclachlan

I try to find the cause to solve this, here is something I found

perf shows that the serving binary use cpu to call some funtion in stdc++

[Switching to thread 3 (Thread 0x7efbfa59d700 (LWP 29))]
-   96.42%     0.00%  tensorflow_mode  libstdc++.so.6.0.28      [.] 0x00007efc03096d84
   - 0x7efc03096d84
      - 93.85% 0x562e4362cbf8
         - 84.45% 0x562e42038fb1
            - 84.15% 0x562e48342359
               - 84.15% 0x562e48342d33
                  - 57.28% 0x562e48342873
                     - 57.28% 0x562e48343fa9
                        - 47.28% 0x7efbfaec289d
                           - 46.57% system_call_fastpath
                              - 45.26% sys_futex
                                 - 43.58% do_futex
                                    - 40.88% futex_wait
                                       - 34.10% futex_wait_queue_me
                                          - 28.93% schedule
                                             - 28.30% __schedule
                                                - 11.27% apic_timer_interrupt
                                                   + smp_apic_timer_interrupt
                                                + 8.94% finish_task_switch
                                                + 5.44% deactivate_task
                                                  0.55% _raw_qspin_lock
                                               0.63% deactivate_task
                                          + 4.41% hrtimer_start_range_ns
                                       + 2.18% futex_wait_setup
                                       + 1.73% apic_timer_interrupt
                                         1.05% hash_futex
                                         0.72% hrtimer_init
                                      1.94% _raw_qspin_lock
                                    + 0.76% apic_timer_interrupt
                                   0.97% futex_wait
                                1.31% copy_user_enhanced_fast_string
                          1.52% 0x7efbfaec28af
                          1.43% 0x7efbfaec28b6
                          1.39% 0x7efbfafde4cb
                          1.05% 0x562e41cad6a0
                        + 0.84% 0x562e48342ffd
                          0.80% 0x562e48343000
                          0.59% 0x7efbfaec28a6
                  + 11.50% 0x562e48342766
                  + 4.76% 0x562e483427e8
                  + 2.72% 0x562e4834289c

and gdb shows the function name

[Switching to thread 3 (Thread 0x7efbfa59d700 (LWP 29))]
#0  0x00007efbfaec289d in syscall () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) bt
#0  0x00007efbfaec289d in syscall () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x0000562e48342ff3 in nsync::nsync_mu_semaphore_p_with_deadline(nsync::nsync_semaphore_s_*, timespec) ()
#2  0x0000562e48343fa9 in nsync::nsync_sem_wait_with_cancel_(nsync::waiter*, timespec, nsync::nsync_note_s_*) ()
#3  0x0000562e48342873 in nsync::nsync_cv_wait_with_deadline_generic(nsync::nsync_cv_s_*, void*, void (*)(void*), void (*)(void*), timespec, nsync::nsync_note_s_*) ()
#4  0x0000562e48342d33 in nsync::nsync_cv_wait_with_deadline(nsync::nsync_cv_s_*, nsync::nsync_mu_s_*, timespec, nsync::nsync_note_s_*) ()
#5  0x0000562e48342359 in tensorflow::internal::wait_until_system_clock(tensorflow::internal::CVData*, tensorflow::internal::MuData*, std::chrono::time_point<std::chrono::_V2::system_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >) ()
#6  0x0000562e42038fb1 in tensorflow::serving::SharedBatchScheduler<tensorflow::serving::BatchingSessionTask>::ThreadLogic() ()
#7  0x0000562e4362cbf8 in tensorflow::serving::PeriodicFunction::RunLoop(long long) ()
#8  0x00007efc03096d84 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#9  0x00007efbfafd0609 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#10 0x00007efbfaec9293 in clone () from /lib/x86_64-linux-gnu/libc.so.6

The call stack seems

tensorflow/core/kernels/batching_util/shared_batch_scheduler.h
- SharedBatchScheduler<TaskType>::ThreadLogic
  - WaitForMilliseconds(&l, &schedulable_batch_cv_, kTimeoutMillis); kTimeoutMillis is 1ms
tensorflow/core/platform/mutex.h
- WaitForMilliseconds inline
tensorflow/core/platform/default/mutex.h not sure
- wait_for
tensorflow/core/platform/default/mutex.cc
- wait_until_system_clock

I cannot find why there is such high cpu utilization, hope someone can help.

Dec 09 '20 03:12 jasonhancn

how about this issue? someone can solve it ?

May 25 '22 09:05 lixikun

Hi, @ericmclachlan

Apologies for the delay and we have uploaded one youtube video about TensorFlow Serving performance optimization here and also we have updated our official documentation w.r.t Tensorflow Serving Configuration here, I hope it will help you to resolve your issue

Could you please try it and let us know is it resolving your issue ? If issue still persists please let us know, In order to expedite the trouble-shooting process, please provide a code snippet to reproduce the issue reported here ?

Thank you!

Dec 16 '22 18:12 gaikwadrahul8

Hi, @ericmclachlan

Closing this issue due to lack of recent activity for couple of weeks. Please feel free to reopen the issue or post comments, if you need any further assistance or update. Thank you!

Feb 20 '23 08:02 gaikwadrahul8