faas Support request - tuning for long timeouts

My actions before raising this issue

[ ] Followed the troubleshooting guide
[x] Read/searched the docs
[x] Searched past issues

Now the shutdownTimeout variable in watchdog is the same as the write_timeout environment variable. This is unfortunate since if you set a high write_timeout value the watchdog will wait for twice that time when shutting down. This will happen even if there are no fprocesses running.

Expected Behaviour

The the watchdog shuts down as quickly as possible when there are no running fprocesses.

Possible Solution

Since the http.Server.Shudown gracefully shuts down the server without interrupting any active connections there might not be any need to have <-time.Tick(shutdownTimeout) in the listenUntilShutdown function (tested this locally with kind and didn’t experience any problems).

However, a fully backward compatible change would be to add a shutdown_timeout environment variable that would set shutdownTimeout if specified and default that back to write_timeout if not specified.

Context

We want to be able to configure a high timeout limit for the fprocess and that the watchdog shuts down as quickly as possible when there are no running fprocesses to use resources more efficiently.

Jul 30 '20 13:07 greenbech

Thanks for your interest.

Now the shutdownTimeout variable in watchdog is the same as the write_timeout environment variable. This is unfortunate since if you set a high write_timeout value the watchdog will wait for twice that time when shutting down. This will happen even if there are no fprocesses running.

This is the safest way we can ensure there are no in-flight requests and mirrors the of-watchdog codebase.

Out of interest, what specific writeTimeout are you setting?

Aug 07 '20 08:08 alexellis

/set title: Support request - tuning for long timeouts

Aug 07 '20 08:08 alexellis

This is a request coming from us at Cognite btw @alexellis 😊 so we target timeouts on 30 min+, and what we have observed is that it seems to finish all running processes before reaching the sleep lines.

Aug 07 '20 12:08 andeplane

Just following up on this, is there any more information that is needed to resolve this issue, @alexellis?

One question I have is regarding what you say here

This is the safest way we can ensure there are no in-flight requests and mirrors the of-watchdog codebase.

Why is this safer than not having the ticks there when the http server is already waiting until the active connections are terminated, as specified in the docs link above? Are you thinking about the requests that are traveling from the gateway/queue-worker to the watchdog?

Oct 12 '20 11:10 matzhaugen

I've not heard from @andeplane for a while, but I'd be happy to discuss putting time aside specifically to help Cognite with this challenge.

Oct 12 '20 12:10 alexellis

We have solved this before this issue was written, but I asked our intern to write it here to discuss. We are happy to provide a PR :)

Oct 13 '20 07:10 andeplane

We fixed this in OpenFaaS Standard in 2021.

https://www.openfaas.com/blog/long-running-jobs/

Closing as stale.

Feb 01 '24 11:02 alexellis

/lock: resolved 3 years ago.

Feb 01 '24 11:02 alexellis

/lock: resolved 3 years ago.

Feb 01 '24 11:02 alexellis