arch-minecraftserver icon indicating copy to clipboard operation
arch-minecraftserver copied to clipboard

Server is not restarted after crash

Open bo0tzz opened this issue 4 years ago • 12 comments

If the minecraft server process crashes, a manual restart of the container is required to get it back in working order. I would expect that either the minecraft server process is restarted automatically inside the container, or that the entire container goes down (so that it can be restarted by an orchestrator).

bo0tzz avatar Oct 04 '21 09:10 bo0tzz

Hi, see https://github.com/binhex/arch-minecraftserver/pull/14 for a poc

szaimen avatar Jan 05 '23 17:01 szaimen

Another possibility could be adding something to the screen that contains the minecraft process like:

for (( ; ; ))
do
        java -Xmx7G -jar minecraft_server.jar nogui
        echo "Server closed unexpectedly, restarting in 10 seconds..."
        sleep 10
done

awnumar avatar Oct 07 '24 00:10 awnumar

ive put in a very basic infinite while loop so the server process will now restart on crash, please pull down the latest image and let me know how it goes.

binhex avatar Oct 08 '24 11:10 binhex

Thanks for adding that @binhex

I've tested that by pulling the latest image and:

  • launching a shell in the container and running kill ${pid of java process}
    • the server gracefully shut down in this case
  • running stop in the web ui
  • running kill -9

each time the server automatically restarted so it seems like this patch works! Thanks.

awnumar avatar Oct 09 '24 22:10 awnumar

I have just realised that this patch unintentionally makes #12 a little worse to deal with. Whenever I want to gracefully shut down, I login to the web-ui and run stop, or run that in-game, wait for it to finish gracefully shutting down and then stop the container.

Now however because it automatically restarts I'm worried there's no way to gracefully shut down as the server will auto-restart before you can stop the container 🤔

I'm not sure if there is a good solution to both problems at the same time. One approach is to run the save command before uncleanly exiting but it's not perfect.

awnumar avatar Oct 09 '24 22:10 awnumar

Maybe something like:

  • #14 to allow containers to be restarted if the process inside is not responding
  • remove the loop to auto-restart the java process, after an alternative to auto-relaunching crashed server is supported
  • when a container is stopped it should send a SIGTERM to the Java process. This happens by default when a container is stopped with docker stop, docker sends a SIGTERM to pid = 1 in the container.

awnumar avatar Oct 09 '24 22:10 awnumar

Whenever I want to gracefully shut down, I login to the web-ui and run stop, or run that in-game, wait for it to finish gracefully shutting down and then stop the container.

I shall put in additional code to trap SIGTERM and CTRL+C, this should then handle the case where you want to force a shutdown whilst still permitting the server to restart.

This is a different problem and has already been addressed, i am making use of dumb-init to pass signals along, i also have written a script to wait for the process to end which i will include to try and ensure the process is not sent a SIGKILL which should fix https://github.com/binhex/arch-minecraftserver/issues/12, it is a little tricky as there is a lot of wheels in wheels going on here.

binhex avatar Oct 10 '24 20:10 binhex

Thanks, sounds good. I'm happy to test the changes when you need.

awnumar avatar Oct 10 '24 21:10 awnumar

the changes are in, please pull down latest.

binhex avatar Oct 11 '24 08:10 binhex

@binhex I've pulled the latest image and tried to trigger a graceful shutdown, sadly I wasn't able to. Testing on Unraid.

When I stop the container, the webUI immediately loses connection and the server stops immediately. I've checked the latest.log and the screen.log files and neither have any indication of the server gracefully stopping. Additionally, when logged into the server as a player, I get the error message "Disconnected - Connection lost" instead of "Disconnected - Server closed".

If I shell into the container and do a kill on the pid of the java process, I do get the expected "Server closed" message and the latest.logs file shows the expected server shutting down log lines.

awnumar avatar Oct 14 '24 19:10 awnumar

Perhaps instead of running the java process inside a screen session, the infinite loop and the java command can run in a blocking fashion at the end of the start.sh script? I think because the current signal catching logic is inside the screen session it may not be receiving any signal, or the pid1 process is exiting immediately and not waiting for the screen to exit. It might be simpler if the subprocess isn't inside a screen, though that might break the way the webui works.

Maybe also the root of the start.sh script could catch the signals and explicitly pkill java and wait for it.

I recently saw this article which has some interesting information: https://sirikon.me/posts/0009-pid-1-bash-script-docker-container.html

awnumar avatar Oct 20 '24 21:10 awnumar

Potentially more powerful version of dumb-init which supports multiple children: https://github.com/linkdd/procfusion

awnumar avatar Oct 20 '24 21:10 awnumar