python-teos icon indicating copy to clipboard operation
python-teos copied to clipboard

Teos-client register timeout

Open orbitalturtle opened this issue 5 years ago • 6 comments

Hi there. I'm running into an issue when I try to register an appointment from the teos-client.

It seems that the API receives the register request, but then the request times out.

Here are the logs from the server:

13/10/2020 18:26:20 [API] Received register request (from_addr=127.0.0.1) [2020-10-13 14:26:50 -0400] [31893] [CRITICAL] WORKER TIMEOUT (pid:31902) [2020-10-13 14:26:51 -0400] [31902] [INFO] Worker exiting (pid: 31902) [2020-10-13 14:26:51 -0400] [31930] [INFO] Booting worker with pid: 31930 13/10/2020 18:26:51 [API] Initialized. Serving at localhost:9814

orbitalturtle avatar Oct 13 '20 19:10 orbitalturtle

Umm, never seen that one before. Looks like gunicorn is timing out indeed.

Can you add debug log level to https://github.com/talaia-labs/python-teos/blob/master/teos/teosd.py#L269-L277 so hopefully we can have a more detailed trace? Like so:

self.api_proc = subprocess.Popen(
    [
        "gunicorn",
        f"--bind={api_endpoint}",
        f"teos.api:serve(internal_api_endpoint='{self.internal_api_endpoint}', "
        f"endpoint='{api_endpoint}', logging_port='{logging_port}', "
        f"min_to_self_delay='{self.config.get('MIN_TO_SELF_DELAY')}')",
        "--log-level debug",
    ]
)

Then run the tower, try to register and add the logs from the point you start:

13/10/2020 20:16:53 [Daemon] Tower identity found. Loading keys
[...]

sr-gi avatar Oct 13 '20 20:10 sr-gi

sure thing, unfortunately doesn't look like it adds any more info:

14/10/2020 05:33:31 [Daemon] Tower identity found. Loading keys 14/10/2020 05:33:31 [Daemon] tower_id = 0230b267961623a4d82264c5aee3fa3fc77fe093cdcaa9b411aed94dc92b2fb8ed 14/10/2020 05:33:31 [Daemon] Starting TEOS 14/10/2020 05:33:31 [Daemon] Fresh bootstrap 14/10/2020 05:33:31 [Daemon] Internal API initialized. Serving at localhost:50051 [2020-10-14 01:33:31 -0400] [11998] [INFO] Starting gunicorn 20.0.4 [2020-10-14 01:33:31 -0400] [11998] [INFO] Listening at: http://127.0.0.1:9814 (11998) [2020-10-14 01:33:31 -0400] [11998] [INFO] Using worker: sync [2020-10-14 01:33:31 -0400] [12001] [INFO] Booting worker with pid: 12001 14/10/2020 05:33:31 [API] Initialized. Serving at localhost:9814 14/10/2020 05:33:31 [RPC] Initialized. Serving at localhost:8814 14/10/2020 05:34:26 [API] Received register request (from_addr=127.0.0.1) [2020-10-14 01:34:56 -0400] [11998] [CRITICAL] WORKER TIMEOUT (pid:12001) [2020-10-14 01:34:57 -0400] [12001] [INFO] Worker exiting (pid: 12001) [2020-10-14 01:34:57 -0400] [12059] [INFO] Booting worker with pid: 12059 14/10/2020 05:34:57 [API] Initialized. Serving at localhost:9814

orbitalturtle avatar Oct 14 '20 05:10 orbitalturtle

I've been reading a bit about why this can happen with gunicorn and it seems to be tricky. Normally is either regarding low resources or the requests taking to long to compute, but I'd rule out the latter given the requests is rather trivial.

I'll dig a bit deeper to see if I can reproduce. What are you running it on?

sr-gi avatar Oct 14 '20 12:10 sr-gi

Thanks! I'm running on Ubuntu 18.04

I just sprinkled some print statements throughout and it looks like it reaches line 159 of api.py but never gets inside register of the Internal API? So maybe grpc is taking a long time for some reason and that's leading to the timeout?

orbitalturtle avatar Oct 14 '20 17:10 orbitalturtle

Fixed it. Setting the environment variable GRPC_DNS_RESOLVER=native did the trick.

Found in here: https://github.com/grpc/grpc/issues/24018

orbitalturtle avatar Oct 14 '20 17:10 orbitalturtle

Interesting. I'll subscribe there to see how it ends up being resolved. Will leave this open until a solution is found so we can use GRPC_DNS_RESOLVER=native for now if someone faces this issue.

sr-gi avatar Oct 14 '20 17:10 sr-gi