[🚀 Feature]: Send delete to geckodriver on timeout
Background
As you may know, there is a long standing issue with geckodriver that Firefox does not exit when geckodriver is terminated. The issue does not surface when a normal session close/quit is performed. In this case, a DELETE is sent to geckodriver and Firefox is shut down. However, if a session times out, OsProcess#destroy kills geckodriver only with an OS signal. And due to the aforementioned issue, Firefox itself is not killed.
Ideally, the geckodriver bug would be fixed. However, since it is a 5 year old bug with resolution being postponed from release to release I don't think this will be resolved in geckodriver itself any time soon.
Proposal
The geckodriver codebase is not anywhere near my skillset, but the Selenium codebase is. If appreciated, I would like to contribute a fix such that org.openqa.selenium.remote.service.DriverService#stop is changed such that in the GeckoDriverService a "hook" is called to shut down geckodriver and Firefox by sending a DELETE to the session.
Note that the `` already contains related code:
URL killUrl = new URL(url.toString() + "/shutdown");
new UrlChecker().waitUntilUnavailable(3, SECONDS, killUrl);
However, geckodriver does not have such a /shutdown endpoint. An approach could be to factor this code out into a separate method and to override it in GeckoDriverService by sending an HTTP DELETE request to the /{session-id} endpoint.
DELETE on stop as default behaviour?
Note that I don't really know why this isn't default behaviour. org.openqa.selenium.grid.node.ProtocolConvertingSession already always forwards DELETE requests. But maybe there are other implementations that I'm not aware of though.
@frensjan, thank you for creating this issue. We will troubleshoot it as soon as we can.
Info for maintainers
Triage this issue by using labels.
If information is missing, add a helpful comment and then I-issue-template label.
If the issue is a question, add the I-question label.
If the issue is valid but there is no time to troubleshoot it, consider adding the help wanted label.
If the issue requires changes or fixes from an external project (e.g., ChromeDriver, GeckoDriver, MSEdgeDriver, W3C),
add the applicable G-* label, and it will provide the correct link and auto-close the
issue.
After troubleshooting the issue, please add the R-awaiting answer label.
Thank you!
@frensjan thanks for raising this. Sounds like a good idea given the current status. We'd be happy to receive a PR for this.
This issue is looking for contributors.
Please comment below or reach out to us through our IRC/Slack/Matrix channels if you are interested.
I'm not sure I agree with this. There are valid use cases for wanting to keep the browser open at the end of a session, even when the process has been exited. Chrome toggles this behavior with the detach parameter.
What is the scenario that we want addressed? I don't understand:
if a session times out
Either the driver can accept the DELETE or it can't. If it can, then the user should be responsible for it. If it can't then Selenium code changes won't be able to address it.
There are valid use cases for wanting to keep the browser open at the end of a session, even when the process has been exited.
The feature would target Selenium Nodes. Could you elaborate on such use cases? The behaviour we see is that once SE_NODE_SESSION_TIMEOUT kicks in (e.g. because a WebDriver is killed and the session is idle), org.openqa.selenium.remote.service.DriverService#stop is called and the geckodriver is killed by org.openqa.selenium.os.OsProcess#destroy. However, due to the issue with geckodriver this causes firefox-bin to be orphaned and 'relocated' as child of PID 1.
To illustrate, this is the situation after starting a session:
seluser@8c131814cbe5:/$ ps axef
PID TTY STAT TIME COMMAND
...
1 ? Ss 0:00 bash /opt/bin/entry_point.sh PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin HOSTNAME=8c131814cbe5 SE_EVENT_BUS_PUBLISH_PORT=4442 S
8 ? S 0:01 /usr/bin/python3 /usr/bin/supervisord --configuration /etc/supervisord.conf SE_START_NO_VNC=true SE_NO_VNC_PORT=7900 SCREEN_DEPTH=24 HOSTNAME=8c13181
...
25 ? S 0:00 \_ bash -c /opt/bin/start-selenium-node.sh && kill -s SIGINT `cat /var/run/supervisor/supervisord.pid` SE_START_NO_VNC=true SE_NO_VNC_PORT=7900 SCRE
35 ? S 0:00 \_ /bin/bash /opt/bin/start-selenium-node.sh SE_START_NO_VNC=true SUPERVISOR_GROUP_NAME=selenium-node SE_NO_VNC_PORT=7900 SCREEN_DEPTH=24 SUPERV
79 ? Sl 0:47 \_ java -jar /opt/selenium/selenium-server.jar node --bind-host false --config /opt/selenium/config.toml SE_START_NO_VNC=true SUPERVISOR_GRO
17398 ? Sl 0:00 \_ /opt/geckodriver-0.31.0 --port=37594 --websocket-port=49713 --allow-origins=http://localhost:49713 PATH=/usr/local/sbin:/usr/local/bi
17405 ? Sl 0:05 \_ /usr/bin/firefox --marionette --remote-debugging-port 49713 --remote-allow-hosts localhost --remote-allow-origins http://localhos
17451 ? Sl 0:00 \_ /opt/firefox-latest/firefox-bin -contentproc -parentBuildID 20220623063721 -prefsLen 23467 -prefMapSize 216108 -appDir /opt/f
17501 ? Sl 0:00 \_ /opt/firefox-latest/firefox-bin -contentproc -childID 1 -isForBrowser -prefsLen 26431 -prefMapSize 216108 -jsInitLen 277276 -
17524 ? Sl 0:00 \_ /opt/firefox-latest/firefox-bin -contentproc -childID 2 -isForBrowser -prefsLen 26552 -prefMapSize 216108 -jsInitLen 277276 -
17596 ? Sl 0:00 \_ /opt/firefox-latest/firefox-bin -contentproc -childID 3 -isForBrowser -prefsLen 34801 -prefMapSize 216108 -jsInitLen 277276 -
...
But once the session times out, the process tree is:
seluser@8c131814cbe5:/$ ps axef
PID TTY STAT TIME COMMAND
...
1 ? Ss 0:00 bash /opt/bin/entry_point.sh PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin HOSTNAME=8c131814cbe5 SE_EVENT_BUS_PUBLISH_PORT=4442 S
8 ? S 0:01 /usr/bin/python3 /usr/bin/supervisord --configuration /etc/supervisord.conf SE_START_NO_VNC=true SE_NO_VNC_PORT=7900 SCREEN_DEPTH=24 HOSTNAME=8c13181
...
25 ? S 0:00 \_ bash -c /opt/bin/start-selenium-node.sh && kill -s SIGINT `cat /var/run/supervisor/supervisord.pid` SE_START_NO_VNC=true SE_NO_VNC_PORT=7900 SCRE
35 ? S 0:00 \_ /bin/bash /opt/bin/start-selenium-node.sh SE_START_NO_VNC=true SUPERVISOR_GROUP_NAME=selenium-node SE_NO_VNC_PORT=7900 SCREEN_DEPTH=24 SUPERV
79 ? Sl 0:49 \_ java -jar /opt/selenium/selenium-server.jar node --bind-host false --config /opt/selenium/config.toml SE_START_NO_VNC=true SUPERVISOR_GRO
...
17405 ? Sl 0:05 /usr/bin/firefox --marionette --remote-debugging-port 49713 --remote-allow-hosts localhost --remote-allow-origins http://localhost:49713/ -no-remote
17451 ? Sl 0:00 \_ /opt/firefox-latest/firefox-bin -contentproc -parentBuildID 20220623063721 -prefsLen 23467 -prefMapSize 216108 -appDir /opt/firefox-latest/browse
17501 ? Sl 0:00 \_ /opt/firefox-latest/firefox-bin -contentproc -childID 1 -isForBrowser -prefsLen 26431 -prefMapSize 216108 -jsInitLen 277276 -parentBuildID 202206
17524 ? Sl 0:00 \_ /opt/firefox-latest/firefox-bin -contentproc -childID 2 -isForBrowser -prefsLen 26552 -prefMapSize 216108 -jsInitLen 277276 -parentBuildID 202206
17596 ? Sl 0:00 \_ /opt/firefox-latest/firefox-bin -contentproc -childID 3 -isForBrowser -prefsLen 34801 -prefMapSize 216108 -jsInitLen 277276 -parentBuildID 202206
Note that geckodriver-0.31.0 with pid 17398 has stopped and firefox-bin with pid 17405 is now without parent.
I'm testing this with selenium/node-firefox:4.3.0.
The situation can be reproduced with a docker-compose.yml like:
version: "3.8"
services:
selenium-hub:
image: selenium/hub:4.3.0
container_name: "selenium-hub"
ports:
- "4444:4444"
firefox:
image: selenium/node-firefox:4.3.0
container_name: "node-firefox"
volumes:
- /dev/shm:/dev/shm
depends_on:
- selenium-hub
environment:
SE_EVENT_BUS_HOST: selenium-hub
SE_EVENT_BUS_PUBLISH_PORT: 4442
SE_EVENT_BUS_SUBSCRIBE_PORT: 4443
SE_NODE_SESSION_TIMEOUT: 10
And starting a session (not closing it), e.g. using Java:
new RemoteWebDriver(new URL("http://localhost:4444"), new FirefoxOptions());
Note that the workaround we plan is to use:
pkill -P 1 firefox-bin ; exit 0
as a health check.
@diemol is there a way the Grid can send the DELETE command to the drivers on nodes as part of closing the session?
Putting the fix in the Service code would prevent an otherwise valid feature when running locally. That said... because geckodriver doesn't have a toggle, I think the behavior of the other languages is different from Java right now.
I guess I read things too quickly and I did not write my thoughts. This use case is only valid for the Grid, there is no timeout for the local execution.
A proper fix is to send a DELETE when the session reaches a time out. This probably needs to be implemented in the LocalNode.
Great @diemol, much appreciated!
This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.