twemproxy icon indicating copy to clipboard operation
twemproxy copied to clipboard

hot-reload

Open idning opened this issue 11 years ago • 9 comments

how it works:

  1. when we receive USR1 signale, we will fork and exec with new binary and new config.

  2. the new process will inherited all listen-socket from the old process, including:

    • stat socket
    • listen socket for each pool
  3. after the new process is already running, the old process will close all listen socket. and wait for 3 seconds (for those who has already connected with old process) 3 seconds later, it close all client socket and shut down

    (however, the keep-alive connectin will be closed, we can not wait forever)

TODO:

  • make it work for unix domain socket
  • need test on nc_kqueue

test cases:

https://github.com/idning/test-twemproxy/blob/master/test_system/test_reload.py

idning avatar Sep 04 '14 12:09 idning

This is great. Thanks a ton! We should probably release a new build before merging this change

Ship it!

manjuraj avatar Oct 09 '14 16:10 manjuraj

I'm really excited for this feature. When can we expect this to be merged?

Thanks a lot!

bitthegeek avatar Oct 17 '14 08:10 bitthegeek

@bitthegeek we need more feedback and usage report about this patch before it can be merged :)

idning avatar Oct 18 '14 08:10 idning

Ok then. I'll test this on some servers and report if something's up. Thanks!

bitthegeek avatar Oct 20 '14 02:10 bitthegeek

@bitthegeek did you have a chance to test this patch?

manjuraj avatar Dec 22 '14 23:12 manjuraj

:+1: The feature is great, but may not be friendly enough with Docker as the original process is killed any way. What about using inotify(mcrouter is using it)?

mckelvin avatar Jan 20 '15 10:01 mckelvin

An issue: if nutcracker is run as a daemon, after hot-reload(kill -10) the pid file is not updated.


Another issue: some tracebacks are caught. (This happened when requesting the proxy during hot-reloading)

nutcracker(nc_stacktrace_fd+0x17)[0x417bf7]
nutcracker(signal_handler+0xec)[0x41577c]
/lib/x86_64-linux-gnu/libpthread.so.0(+0xf0a0)[0x7fcb021ac0a0]
nutcracker(conn_to_ctx+0x11)[0x40c221]
nutcracker(core_core+0x22)[0x40b8f2]
nutcracker(event_wait+0x14f)[0x42160f]
nutcracker(core_loop+0x19)[0x40bdc9]
nutcracker(main+0x5ee)[0x40b36e]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xfd)[0x7fcb01badead]
nutcracker[0x40b72d]

...

[2015-01-21 16:38:39.991] [7] nc_util.c:317 assert 'ep > 0' failed @ (nc_epoll.c, 227)
[2015-01-21 16:38:39.991] [7] nc_util.c:295 [0] nutcracker(event_del_conn+0x79) [0x428589]
[2015-01-21 16:38:39.991] [7] nc_util.c:295 [1] nutcracker(client_close+0x263) [0x40db63]
[2015-01-21 16:38:39.991] [7] nc_util.c:295 [2] nutcracker(server_pool_deinit+0x6d) [0x40fbfd]
[2015-01-21 16:38:39.991] [7] nc_util.c:295 [3] nutcracker(core_stop+0x52) [0x40c052]
[2015-01-21 16:38:39.991] [7] nc_util.c:295 [4] nutcracker(main+0x5fa) [0x40b35a]
[2015-01-21 16:38:39.991] [7] nc_util.c:295 [5] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xfd) [0x7f7b18fcbead]
[2015-01-21 16:38:39.991] [7] nc_util.c:295 [6] nutcracker() [0x40b70d]

...

[2015-01-21 16:39:59.564] [100] nc_util.c:317 assert 'ep > 0' failed @ (nc_epoll.c, 227)
[2015-01-21 16:39:59.565] [100] nc_util.c:295 [0] nutcracker(event_del_conn+0x79) [0x428589]
[2015-01-21 16:39:59.565] [100] nc_util.c:295 [1] nutcracker(client_close+0x263) [0x40db63]
[2015-01-21 16:39:59.565] [100] nc_util.c:295 [2] nutcracker(server_pool_deinit+0x6d) [0x40fbfd]
[2015-01-21 16:39:59.565] [100] nc_util.c:295 [3] nutcracker(core_stop+0x52) [0x40c052]
[2015-01-21 16:39:59.565] [100] nc_util.c:295 [4] nutcracker(main+0x5fa) [0x40b35a]
[2015-01-21 16:39:59.565] [100] nc_util.c:295 [5] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xfd) [0x7f77f5a6cead]
[2015-01-21 16:39:59.565] [100] nc_util.c:295 [6] nutcracker() [0x40b70d]
[2015-01-21 16:40:04.640] [93] nc_server.c:890 disconnect 1 clients on pool 0 'test_shire_bak_mc_pool'
[2015-01-21 16:40:04.640] [93] nc_request.c:96 req 6 done on c 8 req_time 0.239 msec type REQ_MC_GET narg 2 req_len 9 rsp_len 0 key0 'baz' peer '172.17.42.1:43896' done 1 error 0
[2015-01-21 16:40:04.640] [93] nc_util.c:317 assert 'ep > 0' failed @ (nc_epoll.c, 227)
[2015-01-21 16:40:04.641] [93] nc_util.c:295 [0] nutcracker(event_del_conn+0x79) [0x428589]
[2015-01-21 16:40:04.641] [93] nc_util.c:295 [1] nutcracker(client_close+0x263) [0x40db63]
[2015-01-21 16:40:04.641] [93] nc_util.c:295 [2] nutcracker(server_pool_deinit+0x6d) [0x40fbfd]
[2015-01-21 16:40:04.641] [93] nc_util.c:295 [3] nutcracker(core_stop+0x52) [0x40c052]
[2015-01-21 16:40:04.641] [93] nc_util.c:295 [4] nutcracker(main+0x5fa) [0x40b35a]
[2015-01-21 16:40:04.641] [93] nc_util.c:295 [5] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xfd) [0x7f77f5a6cead]
[2015-01-21 16:40:04.641] [93] nc_util.c:295 [6] nutcracker() [0x40b70d]

Full debug log: https://gist.github.com/mckelvin/761893fdfda0adcd8be5

mckelvin avatar Jan 20 '15 11:01 mckelvin

Folks, I created an alternative patch which does not fork the process. It supports the unix sockets, and does not drop the connection. Hope that too can be valuable. Will create a pull request by the end of this week.

vlm avatar Feb 06 '15 08:02 vlm

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

CLAassistant avatar Jul 18 '19 15:07 CLAassistant