client_python icon indicating copy to clipboard operation
client_python copied to clipboard

reuse address on server

Open marcpawl opened this issue 8 years ago • 14 comments

start_hpp_server should reuse the port if it is already being used, most likely due to an earlier instance going down and we are restarting

File "lib64/python3.4/site-packages/prometheus_client/exposition.py", line 103, in run
    httpd = HTTPServer((addr, port), MetricsHandler)
  File "/usr/lib64/python3.4/socketserver.py", line 430, in __init__
    self.server_bind()
  File "/usr/lib64/python3.4/http/server.py", line 136, in server_bind
    socketserver.TCPServer.server_bind(self)
  File "/usr/lib64/python3.4/socketserver.py", line 444, in server_bind
    self.socket.bind(self.server_address)
OSError: [Errno 98] Address already in use

marcpawl avatar Apr 06 '17 19:04 marcpawl

Would you like to send a PR for this?

brian-brazil avatar Apr 06 '17 23:04 brian-brazil

Run a simple server and ctl-c stopping the server, quickly, launch it again. Repeat over and over. On linux the socket will remain closed for a while, to prevent stale messages from be sent to a wrong server. In this case we want to serve new requests immediately.

See https://stackoverflow.com/questions/10705680/how-can-i-restart-a-basehttpserver-instance/10706603#10706603

Fix is below, I added a returning of the http server so it can be closed cleanly by the main loop Something like: if name == 'main': port=sys.argv[1] # Start up the server to expose the metrics. server=start_http_server(int(port)) try: process_commands() finally: server.shutdown() server.close()

exposition.py

--- exposition.py original +++ exposition.py (working copy) @@ -96,17 +96,27 @@ return

+class HttpServerReuseSocket(HTTPServer):

  • """ HTTP server that will allow for the socket to be reused
  •    so the server can start up again, even if there was a previous
    
  •    instance that ended abruptly.
    
  • """
  • def server_bin(self):
  •    HTTPServer.server_bind(self)
    
  •    self.socket.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
    

def start_http_server(port, addr=''): """Starts a HTTP server for prometheus metrics as a daemon thread."""

  • httpd = HttpServerReuseSocket((addr, port), MetricsHandler) class PrometheusMetricsServer(threading.Thread): def run(self):
  •        httpd = HTTPServer((addr, port), MetricsHandler)
           httpd.serve_forever()
    
    t = PrometheusMetricsServer() t.daemon = True t.start()
  • return httpd

marcpawl avatar Apr 07 '17 17:04 marcpawl

Are you still open for a pull request for this issue?

I'm encountering this problem running a client under supervisord, when it tries to restart a process or tries to start a failed process repeatedly.

I would be happy to apply and test @marcpawl 's changes.

-tim

itldo avatar Jul 30 '18 21:07 itldo

Sure.

brian-brazil avatar Jul 30 '18 21:07 brian-brazil

It turns out that my particular problem is caused by an orphaned subprocess which has inherited the HTTPServer's listen file descriptor. Calling subprocess.Popen with close_fds=True avoids my problem.

https://www.python.org/dev/peps/pep-0446/#id11 indicates that Python 3.2 changed the default close_fds value from False to True. Otherwise any child processes that linger, even briefly (e.g 5 to 10 sec), can interfere with a restart.

I'm attempting to reproduce the problem as reported originally in a process without forks.

-tim

itldo avatar Jul 31 '18 00:07 itldo

I have not been able to reproduce the problem beyond having an orphan child process keeping the HTTPServer's listen fd open.

I confirmed that the python HTTPServer already does set its listen socket REUSEADDR option since at least 2.6.0 (and back to year 2000). This is mentioned as a comment in https://stackoverflow.com/questions/10705680/how-can-i-restart-a-basehttpserver-instance/10706603#10706603. I checked the source and printed the socket option's value to confirm.

Based on your desire to avoid exposing the HTTP server implementation, I am not inclined to return the HTTPServer from start_http_server.

-tim

itldo avatar Jul 31 '18 16:07 itldo

Instead of setting SO_REUSEADDR (which is already set), I think start_http_server should set FD_CLOEXEC, to avoid exec'd subprocesses inheriting the fd and reducing the chance of an "Address already in use" problem. I suspect that, like me, the original poster's problem was related to an inherited fd and not a TIME_WAIT.

PEP 446 - Make newly created file descriptors non-inheritable is already in Python 3.4, so this change would make client_python behavior on all versions of Python consistent.

The following code would be applied, subject to the availability of the fcntl module.

    flags = fcntl.fcntl(httpd.socket, fcntl.F_GETFD)
    if (flags != -1):
        fcntl.fcntl(httpd.socket, fcntl.F_SETFD, flags | fcntl.FD_CLOEXEC)

It applies to exec's only and not simply forks, so it does not address child processes created by the multiprocessing module.

But it covers os.system calls and subprocess calls without specifying the close_fds parameter, and reduces the risk of introducing client_python into a application.

As an aside, another approach is to ensure that all child processes die with their parent, rather than becoming orphaned. Running a program under supervisord with the stopasgroup option can help with that.

-tim

itldo avatar Aug 01 '18 17:08 itldo

Hi all, Is there any update on shutting down the httpd server?

Thank you, Yarden

ayashjorden avatar Jan 16 '19 08:01 ayashjorden

Any update on this issue?

jenstroeger avatar May 25 '20 23:05 jenstroeger

Any workarounds?

ghost avatar May 27 '20 09:05 ghost

I've been living with my supervisord workaround described in my last post. I had made my proposed change in my own build, which appeared also to work, however I never took the time to submit it, as my workaround was meeting my needs. I need to consider doing that. -tim

itldo avatar May 27 '20 15:05 itldo

I'm encountering this as well, and in my case, it's a process being started in a docker container - when the container is restarted, it inherits the same FD, and fails to start even though it's the only process using that port. It will continue to do this until the container is deleted or the host is rebooted. This fix is sorely needed.

Routhinator avatar Oct 14 '22 23:10 Routhinator

For now, Im using https://stackoverflow.com/a/61591302/743188 to work around this issue

tommyjcarpenter avatar Nov 16 '22 14:11 tommyjcarpenter

A simple fix for this issue might be to have httpd as a global that can be accessed - we are doing something similar. That way the caller can call httpd.shutdown() in a signal handler. Something like:

httpd: Optional[HTTPServer] = None

def handler(signum: int, _frame: Any) -> None:
    """gracefully shutdown servers on signal"""
    signame = signal.Signals(signum).name
    logger.warn(f"Signal handler called with signal {signame} ({signum})")
    if httpd:
        httpd.shutdown()
        logger.info("Shutdown metrics server")

def start_http_server():
    global httpd
    httpd = HTTPServer(("", 8080), SomeHandler)
    server = threading.Thread(name="metrics", target=httpd.serve_forever)
    server.daemon = True
    server.start()
    logger.debug("Servers started")

client program:

# Set the signal handler
def __main__():
    ...
    signal.signal(signal.SIGINT, handler)  # ctrl-c
    signal.signal(signal.SIGTERM, handler)  # what kubernetes sends for graceful shutdown

tommyjcarpenter avatar Nov 16 '22 14:11 tommyjcarpenter