Unexpectedly high CPU utilization of Docker container; even without models.
System Information
- Host OS: Microsoft Windows 10 Pro (Version 10.0.19041 Build 19041)
- TensorFlow Serving installed from Docker: tensorflow/serving:latest-devel and Ubuntu:18.04
- Docker Desktop: 2.3.0.4 (46911) (using Linux containers)
The Problem
TensorFlow Servings docker container uses significantly more CPU than expected/desirable.
Evaluation
docker stats shows the following:
CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
0e5fca607055 linguistic-intelligence_tensorflow-servings_1 14.75% 24.26MiB / 12.43GiB 0.19% 1.19kB / 0B 0B / 0B 29
Notice the container is reported as using 14.75% CPU. This is after running for 15 minutes.
And here's the issue: This is with zero models loaded. Here, the sustained CPU utilization is around 6-17%. (If models had been loaded the sustained CPU utilization is around 25%.)
Expected Behaviour:
Of course, since TSF has been delegated zero work to do, it's expected that CPU utilization would be minimal.
Exact Steps to Reproduce
Running the Docker Container:
docker-compose up tensorflow-servings
docker-compose.yaml:
tensorflow-servings:
build:
context: .
dockerfile: tensorflow-servings.dockerfile
image: tensorflow-servings
ports:
- 8500:8500
- 8501:8501
volumes:
- ./ml-models/:/models/:ro
- ./ml-servings-config/:/config/:ro
deploy:
resources:
limits:
memory: 4Gb
environment:
- MODEL_CONFIG_FILE_POLL_WAIT_SECONDS=0
tensorflow-servings.dockerfile:
ARG TF_SERVING_VERSION=latest
ARG TF_SERVING_BUILD_IMAGE=tensorflow/serving:${TF_SERVING_VERSION}-devel
FROM ${TF_SERVING_BUILD_IMAGE} as build_image
FROM ubuntu:18.04
ARG TF_SERVING_VERSION_GIT_BRANCH=master
ARG TF_SERVING_VERSION_GIT_COMMIT=head
LABEL tensorflow_serving_github_branchtag=${TF_SERVING_VERSION_GIT_BRANCH}
LABEL tensorflow_serving_github_commit=${TF_SERVING_VERSION_GIT_COMMIT}
RUN apt-get update && apt-get install -y --no-install-recommends \
ca-certificates \
&& \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
# Install TF Serving pkg
COPY --from=build_image /usr/local/bin/tensorflow_model_server /usr/bin/tensorflow_model_server
# PORTS:
# gRPC
EXPOSE 8500
# REST
EXPOSE 8501
# Set where models should be stored in the container
ENV MODEL_BASE_PATH=/models/
# Embed the models
COPY ./ml-models /models
COPY ./ml-servings-config /config
# Create a script that runs the model server so we can use environment variables
# while also passing in arguments from the docker command line
RUN echo '#!/bin/bash \n\n\
tensorflow_model_server \
--allow_version_labels_for_unavailable_models=true \
--batching_parameters_file=/config/batching_parameters.txt \
--enable_batching=true \
--model_base_path=${MODEL_BASE_PATH} \
--model_config_file=/config/all_models.config \
--model_config_file_poll_wait_seconds=60 \
--monitoring_config_file=/config/monitoring_config.txt \
--port=8500 \
--rest_api_port=8501 \
--rest_api_timeout_in_ms=30000 \
"$@"' > /usr/bin/tf_serving_entrypoint.sh \
&& chmod +x /usr/bin/tf_serving_entrypoint.sh
ENTRYPOINT ["/usr/bin/tf_serving_entrypoint.sh"]
all_models.config:
model_config_list {
}
Logs: When starting the container:
$ docker-compose up tensorflow-servings
WARNING: Some services (natural-language-server, tensorflow-servings) use the 'deploy' key, which will be ignored. Compose does not support 'deploy' configuration - use `docker stack deploy` to deploy to a swarm.
Creating linguistic-intelligence_tensorflow-servings_1 ... done
Attaching to linguistic-intelligence_tensorflow-servings_1
tensorflow-servings_1 | 2020-09-07 09:27:05.197858: I tensorflow_serving/model_servers/server_core.cc:464] Adding/updating models.
tensorflow-servings_1 | 2020-09-07 09:27:05.204117: I tensorflow_serving/model_servers/server.cc:367] Running gRPC ModelServer at 0.0.0.0:8500 ...
tensorflow-servings_1 | 2020-09-07 09:27:05.207537: I tensorflow_serving/model_servers/server_core.cc:464] Adding/updating models.
tensorflow-servings_1 | [warn] getaddrinfo: address family for nodename not supported
tensorflow-servings_1 | [evhttp_server.cc : 238] NET_LOG: Entering the event loop ...
tensorflow-servings_1 | 2020-09-07 09:27:05.215164: I tensorflow_serving/model_servers/server.cc:387] Exporting HTTP/REST API at:localhost:8501 ...
tensorflow-servings_1 | 2020-09-07 09:28:05.205505: I tensorflow_serving/model_servers/server_core.cc:464] Adding/updating models.
tensorflow-servings_1 | 2020-09-07 09:29:05.206871: I tensorflow_serving/model_servers/server_core.cc:464] Adding/updating models.
tensorflow-servings_1 | 2020-09-07 09:30:05.206192: I tensorflow_serving/model_servers/server_core.cc:464] Adding/updating models.
tensorflow-servings_1 | 2020-09-07 09:31:05.206607: I tensorflow_serving/model_servers/server_core.cc:464] Adding/updating models.
tensorflow-servings_1 | 2020-09-07 09:32:05.207374: I tensorflow_serving/model_servers/server_core.cc:464] Adding/updating models.
tensorflow-servings_1 | 2020-09-07 09:33:05.207560: I tensorflow_serving/model_servers/server_core.cc:464] Adding/updating models.
tensorflow-servings_1 | 2020-09-07 09:34:05.207728: I tensorflow_serving/model_servers/server_core.cc:464] Adding/updating models.
tensorflow-servings_1 | 2020-09-07 09:35:05.206796: I tensorflow_serving/model_servers/server_core.cc:464] Adding/updating models.
tensorflow-servings_1 | 2020-09-07 09:36:05.213456: I tensorflow_serving/model_servers/server_core.cc:464] Adding/updating models.
tensorflow-servings_1 | 2020-09-07 09:37:05.207654: I tensorflow_serving/model_servers/server_core.cc:464] Adding/updating models.
tensorflow-servings_1 | 2020-09-07 09:38:05.208601: I tensorflow_serving/model_servers/server_core.cc:464] Adding/updating models.
tensorflow-servings_1 | 2020-09-07 09:39:05.208775: I tensorflow_serving/model_servers/server_core.cc:464] Adding/updating models.
tensorflow-servings_1 | 2020-09-07 09:40:05.209190: I tensorflow_serving/model_servers/server_core.cc:464] Adding/updating models.
tensorflow-servings_1 | 2020-09-07 09:41:05.209411: I tensorflow_serving/model_servers/server_core.cc:464] Adding/updating models.
tensorflow-servings_1 | 2020-09-07 09:42:05.210525: I tensorflow_serving/model_servers/server_core.cc:464] Adding/updating models.
tensorflow-servings_1 | 2020-09-07 09:43:05.210102: I tensorflow_serving/model_servers/server_core.cc:464] Adding/updating models.
tensorflow-servings_1 | 2020-09-07 09:44:05.210034: I tensorflow_serving/model_servers/server_core.cc:464] Adding/updating models.
batching_parameters.txt:
max_batch_size { value: 64 }
batch_timeout_micros { value: 0 }
num_batch_threads { value: 4 }
pad_variable_length_inputs: true
Thanks for your attention.
It seems there is a process keep trying to "Adding/updating models", which causing the ~20% CPU usage.
Try setting --file_system_poll_wait_seconds=0
The default is set to 1 second - I think that's why there is an adding/updating models every 1 second - tf serving is checking /models for changes
If what I suggested is the cause of your problem I don't know if I would describe it as a bug nor a workaround - tf-serving is behaving as described - the default is for it to poll the filesystem where you have indicated your models are located, every second, to see if a new version of any of the models have been added. If you don't want it to poll the filesystem every second you have to change the default to the setting that works for your use case.
It also wouldn't effect how much a machine is costing, you would still be charged for the machine whether the cpu was being used or not. A caveat to that is if your models are not located on local storage but on s3 or google storage, the repeated polling of the cloud storage will incur costs.
@dchall88: Yeah, I see that you're right about usage not affecting pricing. I've deleted the comment since it contained misinformation.
To reply to your suggestion,
Try setting --file_system_poll_wait_seconds=0
The default is set to 1 second - I think that's why there is an adding/updating models every 1 second - tf serving is checking /models for changes
I believe the "adding/updating models" message is being reported every 1 minute rather than 1 second. I think it's maybe the config polling interval that is triggering the output you mentioned.
Regarding your suggestion to set --file_system_poll_wait_seconds=0: I implemented the change but I still see non-trivial activity on the container: Around 14.5% CPU; even after 30 minutes.
Following a hunch, I disabled batching and now I see the CPU is consuming 0.05%; a much more reasonable dormant state. So, there's something related to batching that appears to be causing the unexpected CPU usage.
I've now added my batching_parameters.txt file to the main issue description.
Eric is right. The unexpected high CPU utilization is caused by the batching configuration. The four threads will keep busy waiting for requests and thus consume resources. The CPU usage and the num_batch_threads are proportional.
In most cases, tensorflow_model_server will run with models specified/loaded. Eric, is running the tensorflow_model_server without models a meaningful use case for you? Also is the high CPU utilization a big concern? Essentially, the issue could be mitigated if models are loaded and batch queries are sent.
Hi @guanxinq. The only reason I'm not using models here is to demonstrate that the issue is easily reproducible. I observed high CPU usage with models, which led me to try and create a minimalist reproducible example. It just so happened to be that I didn't need a model to demonstrate the behavior.
It seems you are suggesting that having models would reduce CPU usage? This is contrary to my observation.
Sorry for the confusion. Adding models would not help reduce the utilization. I meant sending batched traffic to models and therefore the 4 threads you configured could be made use of.
Hi @guanxinq. My understanding of the issue is that each thread used for managing the batching of requests will use CPU cycles. If the number of threads is increased, so too is the CPU usage. If I implemented the batch scheduling myself, I'm pretty sure I could implement it so that there is negligible CPU usage until requests are received. This is an apparent inefficiency. Whether that inefficiency is important enough to fix is not for me to decide.
I don't understand your suggestion to send requests. Of course, sending requests is my normal use case. Sending requests will raise CPU usage and make this problem more difficult to detect. This doesn't change the inefficiency in any way. At least, not according to my understanding.
If TensorFlow was my product, I would fix the issue because this inefficiency will likely affect every machine that uses batching. Resolving the issue would, similarly, improve the performance of every machine that uses batching. I'd imagine that would make a lot of customers happy.
Apologies if I didn't understand your suggestion correctly.
Hi @ericmclachlan, sorry for the inconvenience. We have an internal task tracking this resource inefficiency issue, will keep you posted when it gets fixed. Meanwhile, feel free to contribute the repo directly if you have any ideas about the batch scheduler.
Thank you, @guanxinq. I appreciate your time and attention.
Wishing you the best of luck.
I try to find the cause to solve this, here is something I found
perf shows that the serving binary use cpu to call some funtion in stdc++
[Switching to thread 3 (Thread 0x7efbfa59d700 (LWP 29))]
- 96.42% 0.00% tensorflow_mode libstdc++.so.6.0.28 [.] 0x00007efc03096d84
- 0x7efc03096d84
- 93.85% 0x562e4362cbf8
- 84.45% 0x562e42038fb1
- 84.15% 0x562e48342359
- 84.15% 0x562e48342d33
- 57.28% 0x562e48342873
- 57.28% 0x562e48343fa9
- 47.28% 0x7efbfaec289d
- 46.57% system_call_fastpath
- 45.26% sys_futex
- 43.58% do_futex
- 40.88% futex_wait
- 34.10% futex_wait_queue_me
- 28.93% schedule
- 28.30% __schedule
- 11.27% apic_timer_interrupt
+ smp_apic_timer_interrupt
+ 8.94% finish_task_switch
+ 5.44% deactivate_task
0.55% _raw_qspin_lock
0.63% deactivate_task
+ 4.41% hrtimer_start_range_ns
+ 2.18% futex_wait_setup
+ 1.73% apic_timer_interrupt
1.05% hash_futex
0.72% hrtimer_init
1.94% _raw_qspin_lock
+ 0.76% apic_timer_interrupt
0.97% futex_wait
1.31% copy_user_enhanced_fast_string
1.52% 0x7efbfaec28af
1.43% 0x7efbfaec28b6
1.39% 0x7efbfafde4cb
1.05% 0x562e41cad6a0
+ 0.84% 0x562e48342ffd
0.80% 0x562e48343000
0.59% 0x7efbfaec28a6
+ 11.50% 0x562e48342766
+ 4.76% 0x562e483427e8
+ 2.72% 0x562e4834289c
and gdb shows the function name
[Switching to thread 3 (Thread 0x7efbfa59d700 (LWP 29))]
#0 0x00007efbfaec289d in syscall () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) bt
#0 0x00007efbfaec289d in syscall () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x0000562e48342ff3 in nsync::nsync_mu_semaphore_p_with_deadline(nsync::nsync_semaphore_s_*, timespec) ()
#2 0x0000562e48343fa9 in nsync::nsync_sem_wait_with_cancel_(nsync::waiter*, timespec, nsync::nsync_note_s_*) ()
#3 0x0000562e48342873 in nsync::nsync_cv_wait_with_deadline_generic(nsync::nsync_cv_s_*, void*, void (*)(void*), void (*)(void*), timespec, nsync::nsync_note_s_*) ()
#4 0x0000562e48342d33 in nsync::nsync_cv_wait_with_deadline(nsync::nsync_cv_s_*, nsync::nsync_mu_s_*, timespec, nsync::nsync_note_s_*) ()
#5 0x0000562e48342359 in tensorflow::internal::wait_until_system_clock(tensorflow::internal::CVData*, tensorflow::internal::MuData*, std::chrono::time_point<std::chrono::_V2::system_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >) ()
#6 0x0000562e42038fb1 in tensorflow::serving::SharedBatchScheduler<tensorflow::serving::BatchingSessionTask>::ThreadLogic() ()
#7 0x0000562e4362cbf8 in tensorflow::serving::PeriodicFunction::RunLoop(long long) ()
#8 0x00007efc03096d84 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#9 0x00007efbfafd0609 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#10 0x00007efbfaec9293 in clone () from /lib/x86_64-linux-gnu/libc.so.6
The call stack seems
- tensorflow/core/kernels/batching_util/shared_batch_scheduler.h
- SharedBatchScheduler<TaskType>::ThreadLogic
- WaitForMilliseconds(&l, &schedulable_batch_cv_, kTimeoutMillis);
kTimeoutMillis is 1ms
- WaitForMilliseconds(&l, &schedulable_batch_cv_, kTimeoutMillis);
- SharedBatchScheduler<TaskType>::ThreadLogic
- tensorflow/core/platform/mutex.h
- WaitForMilliseconds
inline
- WaitForMilliseconds
- tensorflow/core/platform/default/mutex.h
not sure- wait_for
- tensorflow/core/platform/default/mutex.cc
- wait_until_system_clock
I cannot find why there is such high cpu utilization, hope someone can help.
how about this issue? someone can solve it ?
Hi, @ericmclachlan
Apologies for the delay and we have uploaded one youtube video about TensorFlow Serving performance optimization here and also we have updated our official documentation w.r.t Tensorflow Serving Configuration here, I hope it will help you to resolve your issue
Could you please try it and let us know is it resolving your issue ? If issue still persists please let us know, In order to expedite the trouble-shooting process, please provide a code snippet to reproduce the issue reported here ?
Thank you!
Hi, @ericmclachlan
Closing this issue due to lack of recent activity for couple of weeks. Please feel free to reopen the issue or post comments, if you need any further assistance or update. Thank you!