Teos-client register timeout
Hi there. I'm running into an issue when I try to register an appointment from the teos-client.
It seems that the API receives the register request, but then the request times out.
Here are the logs from the server:
13/10/2020 18:26:20 [API] Received register request (from_addr=127.0.0.1) [2020-10-13 14:26:50 -0400] [31893] [CRITICAL] WORKER TIMEOUT (pid:31902) [2020-10-13 14:26:51 -0400] [31902] [INFO] Worker exiting (pid: 31902) [2020-10-13 14:26:51 -0400] [31930] [INFO] Booting worker with pid: 31930 13/10/2020 18:26:51 [API] Initialized. Serving at localhost:9814
Umm, never seen that one before. Looks like gunicorn is timing out indeed.
Can you add debug log level to https://github.com/talaia-labs/python-teos/blob/master/teos/teosd.py#L269-L277 so hopefully we can have a more detailed trace? Like so:
self.api_proc = subprocess.Popen(
[
"gunicorn",
f"--bind={api_endpoint}",
f"teos.api:serve(internal_api_endpoint='{self.internal_api_endpoint}', "
f"endpoint='{api_endpoint}', logging_port='{logging_port}', "
f"min_to_self_delay='{self.config.get('MIN_TO_SELF_DELAY')}')",
"--log-level debug",
]
)
Then run the tower, try to register and add the logs from the point you start:
13/10/2020 20:16:53 [Daemon] Tower identity found. Loading keys
[...]
sure thing, unfortunately doesn't look like it adds any more info:
14/10/2020 05:33:31 [Daemon] Tower identity found. Loading keys 14/10/2020 05:33:31 [Daemon] tower_id = 0230b267961623a4d82264c5aee3fa3fc77fe093cdcaa9b411aed94dc92b2fb8ed 14/10/2020 05:33:31 [Daemon] Starting TEOS 14/10/2020 05:33:31 [Daemon] Fresh bootstrap 14/10/2020 05:33:31 [Daemon] Internal API initialized. Serving at localhost:50051 [2020-10-14 01:33:31 -0400] [11998] [INFO] Starting gunicorn 20.0.4 [2020-10-14 01:33:31 -0400] [11998] [INFO] Listening at: http://127.0.0.1:9814 (11998) [2020-10-14 01:33:31 -0400] [11998] [INFO] Using worker: sync [2020-10-14 01:33:31 -0400] [12001] [INFO] Booting worker with pid: 12001 14/10/2020 05:33:31 [API] Initialized. Serving at localhost:9814 14/10/2020 05:33:31 [RPC] Initialized. Serving at localhost:8814 14/10/2020 05:34:26 [API] Received register request (from_addr=127.0.0.1) [2020-10-14 01:34:56 -0400] [11998] [CRITICAL] WORKER TIMEOUT (pid:12001) [2020-10-14 01:34:57 -0400] [12001] [INFO] Worker exiting (pid: 12001) [2020-10-14 01:34:57 -0400] [12059] [INFO] Booting worker with pid: 12059 14/10/2020 05:34:57 [API] Initialized. Serving at localhost:9814
I've been reading a bit about why this can happen with gunicorn and it seems to be tricky. Normally is either regarding low resources or the requests taking to long to compute, but I'd rule out the latter given the requests is rather trivial.
I'll dig a bit deeper to see if I can reproduce. What are you running it on?
Thanks! I'm running on Ubuntu 18.04
I just sprinkled some print statements throughout and it looks like it reaches line 159 of api.py but never gets inside register of the Internal API? So maybe grpc is taking a long time for some reason and that's leading to the timeout?
Fixed it. Setting the environment variable GRPC_DNS_RESOLVER=native did the trick.
Found in here: https://github.com/grpc/grpc/issues/24018
Interesting. I'll subscribe there to see how it ends up being resolved. Will leave this open until a solution is found so we can use GRPC_DNS_RESOLVER=native for now if someone faces this issue.