[2.7.6] Logs w/ follow hangs, or finalizer error
My code is starting a Docker container with a Postgres instance inside, and waiting for the output to include the text "database system is ready to accept connections".
My code works fine ... but after I see the target text things go wonky.
If I close the LogStream, my process hangs for a while, then gets an exception:
(start-container)
09:07:27.965 INFO no-correlation-id [nREPL-worker-3] c.s.d.c.DefaultDockerClient - Creating container with ContainerConfig: ContainerConfig{hostname=null, domainname=null, username=null, memory=null, memorySwap=null, cpuShares=null, cpuset=null, attachStdin=null, attachStdout=null, attachStderr=null, portSpecs=[62890:5432], exposedPorts=null, tty=null, openStdin=null, stdinOnce=null, env=null, cmd=null, image=quay.io/aviso/combined-db-migrated, volumes=null, workingDir=null, entrypoint=null, networkDisabled=null, onBuild=null}
09:07:28.096 INFO no-correlation-id [nREPL-worker-3] c.s.d.c.DefaultDockerClient - Starting container with HostConfig: HostConfig{binds=null, containerIDFile=null, lxcConf=null, privileged=null, portBindings=null, links=null, publishAllPorts=null, dns=null, dnsSearch=null, volumesFrom=null, networkMode=null}
2015-02-18 17:07:10 UTC LOG: database system was interrupted; last known up at 2015-02-12 21:40:56 UTC
2015-02-18 17:07:10 UTC LOG: database system was not properly shut down; automatic recovery in progress
2015-02-18 17:07:10 UTC LOG: redo starts at 0/1792078
2015-02-18 17:07:10 UTC LOG: record with zero length at 0/19DD660
2015-02-18 17:07:10 UTC LOG: redo done at 0/19DD620
2015-02-18 17:07:10 UTC LOG: last completed transaction was at log time 2015-02-12 21:41:06.015063+00
2015-02-18 17:07:11 UTC LOG: autovacuum launcher started
2015-02-18 17:07:11 UTC LOG: database system is ready to accept connections
up and running!
java.net.SocketTimeoutException: Read timed out
java.lang.RuntimeException: java.net.SocketTimeoutException: Read timed out
(pst)
clojure.core/eval core.clj: 3076
...
docker-utils/eval6732 form-init5456222565433859833.clj: 1
docker-utils/start-container docker_utils.clj: 151
docker-utils/launch-container docker_utils.clj: 128
docker-utils/wait-for-startup2 docker_utils.clj: 110
com.spotify.docker.client.LogStream.close LogStream.java: 74
com.spotify.docker.client.LogReader.close LogReader.java: 88
com.google.common.io.ByteStreams.copy ByteStreams.java: 175
org.glassfish.jersey.message.internal.ReaderInterceptorExecutor$UnCloseableInputStream.read ReaderInterceptorExecutor.java: 295
org.glassfish.jersey.message.internal.EntityInputStream.read EntityInputStream.java: 96
java.io.FilterInputStream.read FilterInputStream.java: 107
java.io.FilterInputStream.read FilterInputStream.java: 133
java.io.BufferedInputStream.read BufferedInputStream.java: 334
java.io.BufferedInputStream.read1 BufferedInputStream.java: 275
java.io.BufferedInputStream.fill BufferedInputStream.java: 235
org.apache.http.conn.EofSensorInputStream.read EofSensorInputStream.java: 137
org.apache.http.impl.io.ChunkedInputStream.read ChunkedInputStream.java: 169
org.apache.http.impl.io.ChunkedInputStream.nextChunk ChunkedInputStream.java: 206
org.apache.http.impl.io.ChunkedInputStream.getChunkSize ChunkedInputStream.java: 240
org.apache.http.impl.io.SessionInputBufferImpl.readLine SessionInputBufferImpl.java: 270
org.apache.http.impl.io.SessionInputBufferImpl.fillBuffer SessionInputBufferImpl.java: 152
org.apache.http.impl.io.SessionInputBufferImpl.streamRead SessionInputBufferImpl.java: 136
java.net.SocketInputStream.read SocketInputStream.java: 121
java.net.SocketInputStream.read SocketInputStream.java: 150
java.net.SocketInputStream.socketRead0 SocketInputStream.java
java.net.SocketTimeoutException: Read timed out
java.lang.RuntimeException: java.net.SocketTimeoutException: Read timed out
(Obviously, I'm working in Clojure, but that should hopefully not be relevant).
On the other hand, if I skip the close step, things work fine ... but at some future point, I see this error in my console:
(System/gc)
=> nil
09:09:19.238 WARN no-correlation-id [Finalizer] c.s.d.c.LogStream - com.spotify.docker.client.LogStream@79220f71 not closed properly
If it helps, here's the code:
(defn- wait-for-startup2
"Waits for the container to startup. Returns nil or throws an exception."
[docker-client container-id container-name]
;; docker logs outputs to STDERR for some reason
(let [^LogStream logs (.. docker-client (logs container-id (into-array DockerClient$LogsParameter [DockerClient$LogsParameter/STDERR
DockerClient$LogsParameter/FOLLOW])))
utf-8 Charsets/UTF_8
;; A bit cumbersome to get the logs
cmd-fn (fn []
(if (.hasNext logs)
(->> logs .next .content (.decode utf-8) .toString)))
test-fn (fn [line]
(print line)
(and line
(re-find #"database system is ready to accept connections" line)))
wait-result (wait-until cmd-fn test-fn 5000)]
(println "up and running!")
; (.close logs)
(if (= ::timeout wait-result)
(throw (DockerException. (format "Docker container `%s' did not complete initialization before timeout." container-name))))))
I'm not sure if @rohansingh already figured out that this really is a bug. But couldn't it be the case that after the expected message was received, you indeed keep listening for new messages - which do not arrive and then lead to a socket timeout exception? By the way the timeouts can be set and even disabled.
Sorry, I haven't looked very closely at this, just gave it the bug label while organizing things.
Ah okay! I will try to come up with a test to proof this issue. Maybe the topic starter can provide some code?
Edit; oops there is code, missed that
Op 27 feb. 2015 om 01:33 heeft Rohan Singh <[email protected]mailto:[email protected]> het volgende geschreven:
Sorry, I haven't looked very closely at this, just gave it the bug label while organizing things.
— Reply to this email directly or view it on GitHubhttps://github.com/spotify/docker-client/issues/125#issuecomment-76308172.
This actually is an issue. I used the following test for proof: (Basically I create a DockerClient with a 4000 ms timeout and a Docker container with a 5 second sleep)
package com.spotify.docker.client;
import java.io.IOException;
import org.hamcrest.Matchers;
import org.junit.AfterClass;
import org.junit.BeforeClass;
import org.junit.Rule;
import org.junit.Test;
import com.spotify.docker.client.DockerClient.LogsParameter;
import com.spotify.docker.test.CreateContainer;
import com.spotify.docker.test.DockerContainer;
import static com.google.common.base.Charsets.UTF_8;
import static org.junit.Assert.assertThat;
public class PrematureLogsCloseCall {
private static DockerClient dockerClient;
@Rule
public DockerContainer dockerContainer = new DockerContainer(dockerClient);
@BeforeClass
public static void setUp() throws DockerCertificateException {
dockerClient = DefaultDockerClient
.fromEnv()
.readTimeoutMillis(3000)
.build();
}
@Test
@CreateContainer(image = "busybox", command = {"sh", "-c", "echo start && sleep 5 && echo stop"}, start = true)
public void testIt() throws IOException,
DockerException, InterruptedException {
String containerId = dockerContainer.getContainerId();
try(LogStream logStream = dockerClient.logs(containerId, LogsParameter.STDOUT, LogsParameter.FOLLOW)) {
assertThat(UTF_8.decode(logStream.next().content()).toString(),
Matchers.containsString("start"));
logStream.close();
}
}
@AfterClass
public static void cleanUp() {
dockerClient.close();
}
}
The problem is that in the LogReader the following happens on close:
@Override
public void close() throws IOException {
closed = true;
// Jersey will close the stream and release the connection after we read all the data.
// We cannot call the stream's close method because it an instance of UncloseableInputStream,
// where close is a no-op.
copy(stream, nullOutputStream());
}
So the remainder of the stream is being copied to a null output stream, but as no messages keep coming in an SocketTimeoutException will be thrown. Why is this being copied to a nullstream anyway? Why not just call close on the wrapped stream and ignore that it is no-op? (The code beneath makes te test pass)
@Override
public void close() throws IOException {
closed = true;
stream.close()
}
Noticed this today when trying to watch logs for a startup message and block until it is seen:
private void waitUntilTomcatStarted() throws IOException, InterruptedException, DockerException {
logger.info("Blocking until Tomcat has started");
long start = System.currentTimeMillis();
try (LogStream logStream = dockerClient.logs(containerId, DockerClient.LogsParameter.STDERR, DockerClient.LogsParameter.FOLLOW)) {
boolean started = false;
while (!started) {
while(logStream.hasNext()) {
LogMessage next = logStream.next();
ByteBuffer content = next.content();
byte[] buffer = new byte[content.remaining()];
content.get(buffer);
String message = new String(buffer, "UTF-8");
System.out.print(message);
if (StringUtils.contains(message, "org.apache.catalina.startup.Catalina.start Server startup in ")) {
logger.info("Tomcat Server started after {}ms", (System.currentTimeMillis() - start));
started = true;
break;
}
}
logger.info("looped");
}
logger.info("I am stuck");
}
logger.info("Never reached");
}
Admittedly this was implemented as a bit of a hack (seemed to be one of the easiest way of checking if the tomcat instance had started). I worked around it by not following the logs and running a dockerClient.logs(containerId, DockerClient.LogsParameter.STDERR) after some progressive sleeps until the message was found (although this feels like an even bigger hack).
I think the issue described in this bug was probably fixed in spotify/docker-client@b76fc638 (Issue #223 ). The SocketTimeoutException was being hit when no data was being sent on the socket for a period of time longer than the read timeout.
I think, as the code within the close block is not fixed, this is probably worse. Now you just keep the socket open, until either the connection or container dies. If you don't need the logs anymore, it should just disconnect the socket on close.
+1, no longer closing LogStream to work around issue
just stumbled over this in v8.1.3, applied the fix from https://github.com/spotify/docker-client/issues/125#issuecomment-76402326 and it works perfectly - can we please integrate this into master? I see that https://github.com/spotify/docker-client/blob/master/src/main/java/com/spotify/docker/client/LogReader.java still has the old bug. Update sadly still hangs the calling Thread, even with the fix to close() the stream.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Hi all,
Since this project went on mature status, please re-open this issue (if it still stands) to https://github.com/dmandalidis/docker-client. Thanks