Issues when using SSH connection method against IPv6-enabled agents
Version report
Jenkins and plugins versions report:
Jenkins: 2.263.3 OS: Linux - 4.15.0-1113-azure
Plugins:
ace-editor:1.1
ansicolor:0.7.5
ant:1.11
antisamy-markup-formatter:2.1
apache-httpcomponents-client-4-api:4.5.13-1.0
authentication-tokens:1.4
azure-ad:1.2.1
azure-commons:1.0.5
basic-branch-build-strategies:1.3.2
bitbucket-pullrequest-builder:1.5.0
block-queued-job:0.2.0
blueocean-autofavorite:1.2.4
blueocean-bitbucket-pipeline:1.24.4
blueocean-commons:1.24.4
blueocean-config:1.24.4
blueocean-core-js:1.24.4
blueocean-dashboard:1.24.4
blueocean-display-url:2.4.1
blueocean-events:1.24.4
blueocean-git-pipeline:1.24.4
blueocean-github-pipeline:1.24.4
blueocean-i18n:1.24.4
blueocean-jira:1.24.4
blueocean-jwt:1.24.4
blueocean-personalization:1.24.4
blueocean-pipeline-api-impl:1.24.4
blueocean-pipeline-editor:1.24.4
blueocean-pipeline-scm-api:1.24.4
blueocean-rest-impl:1.24.4
blueocean-rest:1.24.4
blueocean-web:1.24.4
blueocean:1.24.4
bootstrap4-api:4.6.0-1
bouncycastle-api:2.20
branch-api:2.6.2
build-timeout:1.20
caffeine-api:2.9.1-23.v51c4e2c879c8
cctray-xml:1.0
checks-api:1.4.1
cloud-stats:0.26
cloudbees-bitbucket-branch-source:2.9.7
cloudbees-disk-usage-simple:0.10
cloudbees-folder:6.15
command-launcher:1.5
config-file-provider:3.7.0
configuration-as-code:1.51
credentials-binding:1.24
credentials:2.3.14
display-url-api:2.3.4
docker-build-publish:1.3.2
docker-commons:1.17
docker-java-api:3.1.5.2
docker-plugin:1.2.2
docker-workflow:1.25
durable-task:1.35
echarts-api:4.9.0-3
email-ext:2.81
embeddable-build-status:2.0.3
extended-read-permission:3.2
external-monitor-job:1.7
favorite:2.3.2
font-awesome-api:5.15.2-1
git-client:3.6.0
git-server:1.9
git:4.5.2
github-api:1.122
github-branch-source:2.9.5
github-pullrequest:0.2.8
github:1.32.0
google-oauth-plugin:1.0.3
gradle:1.36
greenballs:1.15.1
handlebars:1.1.1
handy-uri-templates-2-api:2.1.8-1.0
hashicorp-vault-plugin:3.7.0
htmlpublisher:1.25
icon-shim:2.0.3
jackson2-api:2.12.1
javadoc:1.6
jclouds-jenkins:2.20
jdk-tool:1.4
jenkins-design-language:1.24.4
jira:3.1.3
jjwt-api:0.11.2-8.82737cbfa6f5
jquery-detached:1.2.1
jquery3-api:3.5.1-2
jquery:1.12.4-1
jsch:0.1.55.2
junit:1.48
kubernetes-cli:1.10.0
kubernetes-client-api:4.13.2-1
kubernetes-credentials:0.8.0
kubernetes:1.29.0
lockable-resources:2.10
mailer:1.32.1
mapdb-api:1.0.9.0
mask-passwords:2.13
matrix-auth:2.6.6
matrix-project:1.18
mercurial:2.12
metrics:4.0.2.7
momentjs:1.1.1
notification:1.14
oauth-credentials:0.4
okhttp-api:3.14.9
ownership:0.13.0
pam-auth:1.6
parameterized-scheduler:0.9.2
pipeline-build-step:2.13
pipeline-github-lib:1.0
pipeline-graph-analysis:1.10
pipeline-input-step:2.12
pipeline-milestone-step:1.3.2
pipeline-model-api:1.8.3
pipeline-model-definition:1.8.3
pipeline-model-extensions:1.8.3
pipeline-rest-api:2.19
pipeline-stage-step:2.5
pipeline-stage-tags-metadata:1.8.3
pipeline-stage-view:2.19
pipeline-utility-steps:2.6.1
plain-credentials:1.7
plugin-util-api:1.6.1
popper-api:1.16.1-1
prometheus:2.0.8
pubsub-light:1.13
resource-disposer:0.14
role-strategy:3.1
scm-api:2.6.4
script-security:1.76
slack:2.45
snakeyaml-api:1.27.0
sse-gateway:1.24
ssh-credentials:1.18.1
ssh-slaves:1.31.5
structs:1.21
subversion:2.14.0
timestamper:1.11.8
token-macro:2.13
trilead-api:1.0.13
variant:1.4
webhook-step:1.4
windows-slaves:1.7
workflow-aggregator:2.6
workflow-api:2.41
workflow-basic-steps:2.23
workflow-cps-global-lib:2.17
workflow-cps:2.87
workflow-durable-task-step:2.37
workflow-job:2.40
workflow-multibranch:2.22
workflow-scm-step:2.12
workflow-step-api:2.23
workflow-support:3.7
ws-cleanup:0.38
Docker version on agents: 20.10.7
`docker version` output
Client: Docker Engine - Community
Version: 20.10.7
API version: 1.41
Go version: go1.13.15
Git commit: f0df350
Built: Wed Jun 2 11:56:38 2021
OS/Arch: linux/amd64
Context: default
Experimental: true
Server: Docker Engine - Community
Engine:
Version: 20.10.7
API version: 1.41 (minimum version 1.12)
Go version: go1.13.15
Git commit: b0f5bc3
Built: Wed Jun 2 11:54:50 2021
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.4.4
GitCommit: 05f951a3781f4f2c1911b05e61c160e9c30eaa8e
runc:
Version: 1.0.0-rc93
GitCommit: 12644e614e25b05da6fd08a38ffa0cfe1903fdec
docker-init:
Version: 0.19.0
GitCommit: de40ad0
OS: Ubuntu 20.04 LTS on Jenkins master and every agent.
Reproduction steps
- Install jenkins with docker plugin on a virtual machine (VM)
- Configure docker cloud...
- to use docker on another IPv6-enabled VM via tcp (you don't need to use IPv6 here, just keep it enabled system-wide)
- to spin ssh-agent-based image
- to use SSH connection method
- Load the agent with pipelines for a few days
Results
Expected result:
Jenkins can spin a new agent and connect to it using SSH at any time.
Actual result:
Jenkins can spin a new agent but unable to connect it using SSH by the reason explained below.
From docker ps output:
0f1a5876f016 [REDACTED]/jenkins-ci-dinfra:stable "setup-sshd /usr/sbi…" 3 minutes ago Up 3 minutes 0.0.0.0:49243->22/tcp, :::49242->22/tcp musing_boyd
0a5530eb2201 [REDACTED]/jenkins-ci-dinfra:stable "setup-sshd /usr/sbi…" 5 hours ago Up 5 hours 0.0.0.0:49205->22/tcp, :::49204->22/tcp vigorous_blackwell
You can see IPv4 port is different from IPv6 port (49243 vs 49242). Somehow Jenkins is using IPv6 port when trying to ssh into the agent.
I did docker inspect and get logs from Jenkins but for different case (not the same as docker ps output above). But situation is the same.
Logs from Jenkins master (hostnames are altered):
SSHLauncher{host='slavep3.node', port=49739, credentialsId='13457128-567e-4f7d-bd8c-1e85c619b69e', jvmOptions='', javaPath='', prefixStartSlaveCmd='', suffixStartSlaveCmd='', launchTimeoutSeconds=60, maxNumRetries=30, retryWaitTime=2, sshHostKeyVerificationStrategy=hudson.plugins.sshslaves.verifiers.NonVerifyingKeyVerificationStrategy, tcpNoDelay=true, trackCredentials=true}
[06/16/21 12:22:47] [SSH] Opening SSH connection to slavep3.node:49739.
Connection refused (Connection refused)
[long java trace here]
NetworkSettings.Ports from docker inspect output:
"Ports": {
"22/tcp": [
{
"HostIp": "0.0.0.0",
"HostPort": "49740"
},
{
"HostIp": "::",
"HostPort": "49739"
}
]
},
As you can see Jenkins was trying to connect to the port 49739 via IPv4 (we don't have IPv6 connectivity at the moment). But docker-proxy was listening on port 49740 for IPv4 instead.
Our pipelines are spawning another docker containers to run some tests and theoretically can take some port above 40000 for a while. So docker may fail to listen on the port and choosing next one for IPv4. We're not using IPv6 here. That's why ports are different (I guess). But why Jenkins is always choosing IPv6 one is another question...
Noticed a similar behavior after swapping an old Docker VM for one supporting IPv6. Issue seems to happen sporadically in my case, agents can be spinning up fine then after a while same behavior as outlined above except we get a different error in the Jenkins log:
java.io.IOException: SSH service hadn't started after 60 seconds and 52 milliseconds.Try increasing the number of retries (currently 30) and/or the retry wait time (currently 2) to allow for containers taking longer to start.
at io.jenkins.docker.connector.DockerComputerSSHConnector.createLauncher(DockerComputerSSHConnector.java:269)
at io.jenkins.docker.connector.DockerComputerConnector.createLauncher(DockerComputerConnector.java:91)
at com.nirima.jenkins.plugins.docker.DockerTemplate.doProvisionNode(DockerTemplate.java:574)
at com.nirima.jenkins.plugins.docker.DockerTemplate.provisionNode(DockerTemplate.java:536)
at com.nirima.jenkins.plugins.docker.DockerCloud$1.run(DockerCloud.java:370)
Restarting the docker daemon resolves the issue until the next time it occurs. I am in AWS and security group fronting the Docker instance currently does not allow IPv6 ingress. Next time the issue occurs I am going to allow IPv6 traffic to see if it has an effect.
I think I'm hitting the same issue. IPv4 port is bound to 49222 and IPv6 to 49221, see below.
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 1e3b258d65c8 <MY_IMAGE> "/usr/sbin/sshd -D -…" 14 seconds ago Up 13 seconds 0.0.0.0:49222->22/tcp, :::49221->22/tcp distracted_tesla
While Jenkins tries to connect to the IPv4 address using the IPv6 port, see from Jenkins log:
Could not connect to <MY_IP> port 49221. Are you sure this location is contactable from Jenkins?
Our workaround was to disable IPv6 on the host machine.
I disabled IPv6 support in Docker daemon config and haven't had the issue reoccur.
I may be out on a limb here, as I've only been browsing the code on GitHub and haven't debugged it (and may not even be looking at the correct part of the code for all I know), but in [DockerComputerSSHConnector.java getBindingForPort](https://github.com/jenkinsci/docker-plugin/blob/master/src/main/java/io/jenkins/docker/connector/DockerComputerSSHConnector.java#:~:text=private%20static%20InetSocketAddress-,getBindingForPort,-(DockerAPI%20api%2C%20InspectContainerResponse) there's this:
// Find where it's mapped to
for (Ports.Binding b : sshBindings) {
String hps = b.getHostPortSpec();
port = Integer.valueOf(hps);
}
String host = getExternalIP(api, ir, networkSettings, sshBindings);
return new InetSocketAddress(host, port);
Looks like in the case of multiple bindings it will always return the port for the last binding in sshBindings without validating that it is the correct port, which may cause an issue if the correct port is earlier in the array.
Yup, that's the correct bit of code. The problem is that the plugin doesn't really know which IP/port is going to be "the one that works" - it has no visibility of the network environment in which Jenkins runs; it doesn't know what iPs are routable and which aren't so it just has to blindly believe the docker daemon's output as it knows no better. FYI this is a problem common to other "cloud provider" plugins too - the plugin can't (easily) second-guess the operating system's routing table and/or whatever external routes exist to decide "we'll ignore that one as we know IPv6 won't work here" etc. IME, when the Jenkins master's network's ability to SSH to a remote agent is incomplete, it's best to use JNLP and have the remote agent call Jenkins instead.
If y'all can figure out some means by which the plugin could make a decision (and then submit a PR for it), that would be welcomed, but if you merely need a workaroud, I'd suggest using JNLP or "Direct Attach" instead of SSH.
This may be wrong but from what I can tell the the code above sets port number to last binding. However, getExternalIP returns the IP of the first binding if it is a swarm. If this is case that would explain the issue. It seems to me that the IP and port need to be synced to match bindings returned from docker.
if (api.isSwarm()) {
for (Ports.Binding b : sshBindings) {
String ipAddress = b.getHostIp();
if (ipAddress != null && !"0.0.0.0".equals(ipAddress)) {
return ipAddress;
}
}
}
Does anyone know, when this was introduced? We have this annoying issue and want to revert back to some old version if this helps.
I disabled IPv6 support in Docker daemon config and haven't had the issue reoccur.
It is disabled by default, according to 'man dockerd':
--ipv6=true|false
Enable IPv6 support. Default is false.
Sorry forgot to mention, I had "ipv6": true in /etc/docker/daemon.json initially (AMI was custom built with IPv6 enabled) but then disabled it as a result of running into this issue.
Apparently "ipv6": true option did not fix the issue for me after all as it was still reoccurring. This option does not prevent Docker from mapping container ports to IPv6 addresses. What I ultimately did was disable IPv6 on my Docker host in the kernel"
net.ipv6.conf.all.disable_ipv6=1
net.ipv6.conf.default.disable_ipv6=1
I'm also seeing this issue, but maybe have a breadcrumb to help. I set up docker on an ARMv8 Linux machine and we've had no issues. Then I attempted to add an x86_64 docker server and nothing can connect because it runs the ssh daemon's IPv4 on a different port than the one it looks for, which is the IPv6 port it uses.
and - adding the --ipv6=false flag to the dockerd command seemed to resolve the issue.
Please re-enable IPv6 in Docker and try the incremental build from #962 to see whether it resolves the issue for you.
If the workaround added in #962 is insufficient, then please open a new ticket with steps to reproduce the issue.