Connection refused right after container start
There is a problem with migration from Docker Desktop to Colima docker environment. When the container is just started, it's mapped port is not available immediately. I'm not quite sure about all what happens underneath (is it an issue of Lima/Colima/jna/docker-java), but let me give my assumption, the demo is provided as well.
We use testcontainers, a popular framework that bootstraps docker containers for test purposes, in our case it's Postgres. We use image postgres:11.6-alpine.
When the container starts, it allows to retrieve mapped port and right after that the connection is set up.
With Docker desktop it works just fine, but with colima the retry-with-interval workaround is required. Default logic was not expecting this, it was just working fine for years.
My assumption is that the logic that retrieves mapped port is not blocked by startup await, while in Docker desktop there is a strong guarantee about it.
I've created a demo project that shows the issue. If you start with Docker Desktop, you will see log:
...testcontainers initialization without issues...
...
jdbc:postgresql://localhost:55001/test?loggerLevel=OFF
>>>> Successfully first time
If you start the same test with Colima, you will see log:
...testcontainers initialization with many retry failures (bootstraps service Ryuk container first)
...
jdbc:postgresql://localhost:49154/test?loggerLevel=OFF
>>>> First exception: org.postgresql.util.PSQLException: Connection to localhost:49154 refused. Check that the hostname and port are correct and that the postmaster is accepting TCP/IP connections.
...sleep here...
>>>> Successfully second time
The mapped port is obtained in internal logic in this line:
String jdbcUrl = container.getJdbcUrl();
Worth to mention, that on Apple M1 this failure is more explicit (probably because it's faster), with Intel I see only testcontainers Ryuk failures (most probably they added this retry policy to cover the described issue), but still there is an obvious problem.
Let me know if I can provide more details or more convenient example (not sure that Java example is good enough for you).
The full log for Colima execution:
[main] INFO org.testcontainers.dockerclient.DockerClientProviderStrategy - Loaded org.testcontainers.dockerclient.UnixSocketClientProviderStrategy from ~/.testcontainers.properties, will try it first
[main] INFO org.testcontainers.dockerclient.DockerClientProviderStrategy - Found Docker environment with local Unix socket (unix:///var/run/docker.sock)
[main] INFO org.testcontainers.DockerClientFactory - Docker host IP address is localhost
[main] INFO org.testcontainers.DockerClientFactory - Connected to docker:
Server Version: 20.10.7
API Version: 1.41
Operating System: Ubuntu 21.04
Total Memory: 1966 MB
[main] INFO org.testcontainers.utility.ImageNameSubstitutor - Image name substitution will be performed by: DefaultImageNameSubstitutor (composite of 'ConfigurationFileImageNameSubstitutor' and 'PrefixingImageNameSubstitutor')
[testcontainers-ryuk] WARN org.testcontainers.utility.ResourceReaper - Can not connect to Ryuk at localhost:49153
java.net.ConnectException: Connection refused (Connection refused)
at java.base/java.net.PlainSocketImpl.socketConnect(Native Method)
at java.base/java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:399)
at java.base/java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:242)
at java.base/java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:224)
at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.base/java.net.Socket.connect(Socket.java:609)
at org.testcontainers.utility.ResourceReaper.lambda$null$1(ResourceReaper.java:189)
at org.rnorth.ducttape.ratelimits.RateLimiter.doWhenReady(RateLimiter.java:27)
at org.testcontainers.utility.ResourceReaper.lambda$start$2(ResourceReaper.java:185)
at java.base/java.lang.Thread.run(Thread.java:829)
[testcontainers-ryuk] WARN org.testcontainers.utility.ResourceReaper - Can not connect to Ryuk at localhost:49153
java.net.ConnectException: Connection refused (Connection refused)
at java.base/java.net.PlainSocketImpl.socketConnect(Native Method)
at java.base/java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:399)
at java.base/java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:242)
at java.base/java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:224)
at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.base/java.net.Socket.connect(Socket.java:609)
at org.testcontainers.utility.ResourceReaper.lambda$null$1(ResourceReaper.java:189)
at org.rnorth.ducttape.ratelimits.RateLimiter.doWhenReady(RateLimiter.java:27)
at org.testcontainers.utility.ResourceReaper.lambda$start$2(ResourceReaper.java:185)
at java.base/java.lang.Thread.run(Thread.java:829)
[testcontainers-ryuk] WARN org.testcontainers.utility.ResourceReaper - Can not connect to Ryuk at localhost:49153
java.net.ConnectException: Connection refused (Connection refused)
at java.base/java.net.PlainSocketImpl.socketConnect(Native Method)
at java.base/java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:399)
at java.base/java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:242)
at java.base/java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:224)
at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.base/java.net.Socket.connect(Socket.java:609)
at org.testcontainers.utility.ResourceReaper.lambda$null$1(ResourceReaper.java:189)
at org.rnorth.ducttape.ratelimits.RateLimiter.doWhenReady(RateLimiter.java:27)
at org.testcontainers.utility.ResourceReaper.lambda$start$2(ResourceReaper.java:185)
at java.base/java.lang.Thread.run(Thread.java:829)
[testcontainers-ryuk] WARN org.testcontainers.utility.ResourceReaper - Can not connect to Ryuk at localhost:49153
java.net.ConnectException: Connection refused (Connection refused)
at java.base/java.net.PlainSocketImpl.socketConnect(Native Method)
at java.base/java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:399)
at java.base/java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:242)
at java.base/java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:224)
at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.base/java.net.Socket.connect(Socket.java:609)
at org.testcontainers.utility.ResourceReaper.lambda$null$1(ResourceReaper.java:189)
at org.rnorth.ducttape.ratelimits.RateLimiter.doWhenReady(RateLimiter.java:27)
at org.testcontainers.utility.ResourceReaper.lambda$start$2(ResourceReaper.java:185)
at java.base/java.lang.Thread.run(Thread.java:829)
[testcontainers-ryuk] WARN org.testcontainers.utility.ResourceReaper - Can not connect to Ryuk at localhost:49153
java.net.ConnectException: Connection refused (Connection refused)
at java.base/java.net.PlainSocketImpl.socketConnect(Native Method)
at java.base/java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:399)
at java.base/java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:242)
at java.base/java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:224)
at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.base/java.net.Socket.connect(Socket.java:609)
at org.testcontainers.utility.ResourceReaper.lambda$null$1(ResourceReaper.java:189)
at org.rnorth.ducttape.ratelimits.RateLimiter.doWhenReady(RateLimiter.java:27)
at org.testcontainers.utility.ResourceReaper.lambda$start$2(ResourceReaper.java:185)
at java.base/java.lang.Thread.run(Thread.java:829)
[testcontainers-ryuk] WARN org.testcontainers.utility.ResourceReaper - Can not connect to Ryuk at localhost:49153
java.net.ConnectException: Connection refused (Connection refused)
at java.base/java.net.PlainSocketImpl.socketConnect(Native Method)
at java.base/java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:399)
at java.base/java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:242)
at java.base/java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:224)
at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.base/java.net.Socket.connect(Socket.java:609)
at org.testcontainers.utility.ResourceReaper.lambda$null$1(ResourceReaper.java:189)
at org.rnorth.ducttape.ratelimits.RateLimiter.doWhenReady(RateLimiter.java:27)
at org.testcontainers.utility.ResourceReaper.lambda$start$2(ResourceReaper.java:185)
at java.base/java.lang.Thread.run(Thread.java:829)
[testcontainers-ryuk] WARN org.testcontainers.utility.ResourceReaper - Can not connect to Ryuk at localhost:49153
java.net.ConnectException: Connection refused (Connection refused)
at java.base/java.net.PlainSocketImpl.socketConnect(Native Method)
at java.base/java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:399)
at java.base/java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:242)
at java.base/java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:224)
at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.base/java.net.Socket.connect(Socket.java:609)
at org.testcontainers.utility.ResourceReaper.lambda$null$1(ResourceReaper.java:189)
at org.rnorth.ducttape.ratelimits.RateLimiter.doWhenReady(RateLimiter.java:27)
at org.testcontainers.utility.ResourceReaper.lambda$start$2(ResourceReaper.java:185)
at java.base/java.lang.Thread.run(Thread.java:829)
[testcontainers-ryuk] WARN org.testcontainers.utility.ResourceReaper - Can not connect to Ryuk at localhost:49153
java.net.ConnectException: Connection refused (Connection refused)
at java.base/java.net.PlainSocketImpl.socketConnect(Native Method)
at java.base/java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:399)
at java.base/java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:242)
at java.base/java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:224)
at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.base/java.net.Socket.connect(Socket.java:609)
at org.testcontainers.utility.ResourceReaper.lambda$null$1(ResourceReaper.java:189)
at org.rnorth.ducttape.ratelimits.RateLimiter.doWhenReady(RateLimiter.java:27)
at org.testcontainers.utility.ResourceReaper.lambda$start$2(ResourceReaper.java:185)
at java.base/java.lang.Thread.run(Thread.java:829)
[testcontainers-ryuk] WARN org.testcontainers.utility.ResourceReaper - Can not connect to Ryuk at localhost:49153
java.net.ConnectException: Connection refused (Connection refused)
at java.base/java.net.PlainSocketImpl.socketConnect(Native Method)
at java.base/java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:399)
at java.base/java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:242)
at java.base/java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:224)
at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.base/java.net.Socket.connect(Socket.java:609)
at org.testcontainers.utility.ResourceReaper.lambda$null$1(ResourceReaper.java:189)
at org.rnorth.ducttape.ratelimits.RateLimiter.doWhenReady(RateLimiter.java:27)
at org.testcontainers.utility.ResourceReaper.lambda$start$2(ResourceReaper.java:185)
at java.base/java.lang.Thread.run(Thread.java:829)
[testcontainers-ryuk] WARN org.testcontainers.utility.ResourceReaper - Can not connect to Ryuk at localhost:49153
java.net.ConnectException: Connection refused (Connection refused)
at java.base/java.net.PlainSocketImpl.socketConnect(Native Method)
at java.base/java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:399)
at java.base/java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:242)
at java.base/java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:224)
at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.base/java.net.Socket.connect(Socket.java:609)
at org.testcontainers.utility.ResourceReaper.lambda$null$1(ResourceReaper.java:189)
at org.rnorth.ducttape.ratelimits.RateLimiter.doWhenReady(RateLimiter.java:27)
at org.testcontainers.utility.ResourceReaper.lambda$start$2(ResourceReaper.java:185)
at java.base/java.lang.Thread.run(Thread.java:829)
[main] INFO org.testcontainers.DockerClientFactory - Ryuk started - will monitor and terminate Testcontainers containers on JVM exit
[main] INFO org.testcontainers.DockerClientFactory - Checking the system...
[main] INFO org.testcontainers.DockerClientFactory - ✔︎ Docker server version should be at least 1.6.0
[main] INFO org.testcontainers.DockerClientFactory - ✔︎ Docker environment should have more than 2GB free disk space
[main] INFO 🐳 [postgres:11.6-alpine] - Creating container for image: postgres:11.6-alpine
[main] INFO 🐳 [postgres:11.6-alpine] - Starting container with ID: 879500d74a3efe810293edf00c890507d7a7818964d040adcab280bc64b41855
[main] INFO 🐳 [postgres:11.6-alpine] - Container postgres:11.6-alpine is starting: 879500d74a3efe810293edf00c890507d7a7818964d040adcab280bc64b41855
[main] INFO 🐳 [postgres:11.6-alpine] - Container postgres:11.6-alpine started in PT1.299692S
jdbc:postgresql://localhost:49154/test?loggerLevel=OFF
>>>> First exception: org.postgresql.util.PSQLException: Connection to localhost:49154 refused. Check that the hostname and port are correct and that the postmaster is accepting TCP/IP connections.
>>>> Successfully second time
Process finished with exit code 0
When the container is just started, it's mapped port is not available immediately.
This is expected, and due to the way port forwarding in Lima works: the guest agent running inside the VM checks every 5 seconds for open ports. It sends any changes since the previous check to the host agent (running on the host), which will in turn create a tunnel for each new port, and tear down the tunnel for each port that has been closed.
So you should expect to wait on average about 2½s before the port becomes available on the host, and potentially up to 5s (or more if the machine is busy).
To avoid the delay you would need to use an additional vmnet port, via vde_vmnet. I don't know if Colima exposes support for this, but Lima can manage the additional daemons automatically. It does require root to set up initially.
To avoid the delay you would need to use an additional vmnet port, via vde_vmnet. I don't know if Colima exposes support for this, but Lima can manage the additional daemons automatically. It does require root to set up initially.
This is something to consider for future releases
The plan is to assign reachable IP addresses to the VM so they can be accessed directly in scenarios like this. It is currently doable with lima override and network config but it is not user friendly.
On my case (MacOS), due to dropping docker desktop I realised that my /var/run/docker.sock was still targeting the old file (this would then be making other stuff to fail).
Solved it by removing the old symlink and then sudo ln -s /Users/$USER/.colima/docker.sock /var/run/docker.sock.
Kindly install the current development version with brew install --HEAD colima.
You can retrieve the IP address via colima ls, the VM should always be reachable via the IP address.
I am running into this on a Linux machine, where although the port is bound, I'm unable to connect right away.
I'm afraid I am also experiencing this issue - with a postgres image (11-alpine). It seems flaky though and I am not sure where to even address this, but the issues only started occurring after installing colima (instead of Docker Desktop).
Ryuk starts and the database seems to start, too, but then sometimes it'll go through with the tests and sometimes I receive:
2023-05-09 00:16:58.113 INFO 25373 --- [ream--357132296] z.t.d.p.p.DockerPostgresDatabaseProvider : STDERR: 2023-05-08 22:16:57.925 UTC [1] LOG: listening on IPv6 address "::", port 5432
2023-05-09 00:16:58.113 INFO 25373 --- [ream--357132296] z.t.d.p.p.DockerPostgresDatabaseProvider : STDERR: 2023-05-08 22:16:57.925 UTC [1] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2023-05-09 00:16:58.113 INFO 25373 --- [ream--357132296] z.t.d.p.p.DockerPostgresDatabaseProvider : STDERR: 2023-05-08 22:16:57.935 UTC [51] LOG: database system was shut down at 2023-05-08 22:16:57 UTC
2023-05-09 00:16:58.113 INFO 25373 --- [ream--357132296] z.t.d.p.p.DockerPostgresDatabaseProvider : STDERR: 2023-05-08 22:16:57.939 UTC [1] LOG: database system is ready to accept connections
2023-05-09 00:16:58.157 INFO 25373 --- [ main] o.hibernate.jpa.internal.util.LogHelper : HHH000204: Processing PersistenceUnitInfo [name: default]
2023-05-09 00:16:58.193 INFO 25373 --- [ main] org.hibernate.Version : HHH000412: Hibernate ORM core version 5.4.32.Final
2023-05-09 00:16:58.247 INFO 25373 --- [ main] o.hibernate.annotations.common.Version : HCANN000001: Hibernate Commons Annotations {5.1.2.Final}
2023-05-09 00:16:58.295 WARN 25373 --- [ main] o.h.e.j.e.i.JdbcEnvironmentInitiator : HHH000342: Could not obtain connection to query metadata
java.net.ConnectException: Connection refused
at org.postgresql.core.PGStream.createSocket(PGStream.java:241) [6 skipped]
at org.postgresql.core.PGStream.<init>(PGStream.java:98)
at org.postgresql.core.v3.ConnectionFactoryImpl.tryConnect(ConnectionFactoryImpl.java:109)
at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:235)
at org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactory.java:49)
at org.postgresql.jdbc.PgConnection.<init>(PgConnection.java:223)
at org.postgresql.Driver.makeConnection(Driver.java:402)
at org.postgresql.Driver.connect(Driver.java:261)
at java.sql/java.sql.DriverManager.getConnection(DriverManager.java:681)
at java.sql/java.sql.DriverManager.getConnection(DriverManager.java:229)
... 27 common frames omitted
Wrapped by: org.postgresql.util.PSQLException: Connection to localhost:49160 refused. Check that the hostname and port are correct and that the postmaster is accepting TCP/IP connections.
I am not sure this is a colima issue and I tried it with various configurations. Sometimes deleting the image and pulling it anew helps, sometimes it does not.
Does anyone here perhaps have an idea?
@divStar you can add a retry get connection logic before using the DataSource wrapping the postgres container. That's my code:
/**
* The colima docker environment has a bit another behavior when bootstrapping container. It is possible to fail to make a connection right after
* PostgreSQLContainer.getMappedPort(). Most probably it's an issue of Lima/Colima that should be addressed, but right now the workaround suggested: retry few times with interval.
* <p>
* https://github.com/abiosoft/colima/issues/71
*/
public final class DockerDataSourceUtils {
private static final int MAX_RETRY_CONNECTION = 4;
public static void checkGetConnectionWithRetry(DataSource dataSource) throws IllegalStateException {
try (Connection connection = dataSource.getConnection()) {
// no sleep by default
return;
} catch (SQLException originalException) {
for (int i = 1; i <= MAX_RETRY_CONNECTION; i++) {
try {
Thread.sleep(500L * i);
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
throw new IllegalStateException("Interrupted", e);
}
try (Connection connection = dataSource.getConnection()) {
return;
} catch (SQLException skip) {
continue;
}
}
throw new IllegalStateException(originalException);
}
}
It is due to the way port forwarding works. There is a chance of up to 3s delay before a port is forwarded from the container to your localhost.
An option for a scenario like this would be to enable IP address for the VM and you can simply use the VM IP to access it. However, it is currently erratic on macOS ventura and would not be a recommended alternative.
Kindly install the current development version with
brew install --HEAD colima. You can retrieve the IP address viacolima ls, the VM should always be reachable via the IP address.
I am currently also experiencing the issue that colima is running fine, I can start containers using docker-compose (A Kafka cluster) - but from the outside, I am unable to reach the forwarded ports. It's just connection refused. Then I stumbled on this comment and wanted to see my colima VM address, but there is none:
❯ colima ls
PROFILE STATUS ARCH CPUS MEMORY DISK RUNTIME ADDRESS
default Running x86_64 4 4GiB 20GiB docker
What does that mean?
My real problem I try to get solved though is this:
❯ docker port b4cf984adcf0
2181/tcp -> 0.0.0.0:22181
2181/tcp -> [::]:22181
❯ nc -zv localhost 22181
nc: connectx to localhost port 22181 (tcp) failed: Connection refused
nc: connectx to localhost port 22181 (tcp) failed: Connection refused
Any help is appreciated!
We see this intermittently in about half of DDEV's automated test runs on Colima. It's a persistent problem.
why is this closed?
The problem is still here in .NET with TestContainers. Half of the starts fail.
There needs to be a proper monitoring of Docker containers and dynamic port forwarding handled by Colima itself. Rather than falling back to Lima's which can have a delay of up to 3 seconds.
Is it possible to help with this somehow? The issue happens so often with zonky and testcontainers, that I am considering giving Rancher Desktop a try even though colima is perfect in terms of control and its minimalism...
This has been prioritised.
Is it possible to help with this somehow? The issue happens so often with zonky and testcontainers, that I am considering giving Rancher Desktop a try even though colima is perfect in terms of control and its minimalism...
I think Rancher, Lima, and Colima share this basic problem but could be wrong. I think it's a Lima problem. It may also be related to the port-bind management discussed in
- https://github.com/lima-vm/lima/issues/2536
I think Rancher, Lima, and Colima share this basic problem but could be wrong. I think it's a Lima problem. It may also be related to the port-bind management discussed in
Yeah, they are all affected by this issue.