play1 icon indicating copy to clipboard operation
play1 copied to clipboard

Continuations - partially blocked pool threads

Open sant0s opened this issue 6 years ago • 7 comments

References

Discuss Play Forum: https://discuss.lightbend.com/t/continuations-partially-blocked-pool-threads/3752 Lighthouse (copy of this ticket): https://play.lighthouseapp.com/projects/57987/tickets/2390-continuations-partially-blocked-pool-threads

Description

Play! version: 1.5.2 Java version: 1.8.0_201 Ubuntu 18.04.2 LTS

Consider a Play app consisting of a controller that makes use of continuations in order to allow simultaneous requests to be run by pool threads.

Create a test app

play new test

and change the application controller to

package controllers;

import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.Future;
import play.Logger;
import play.mvc.Controller;

public class Application extends Controller {

    private static ExecutorService pool = Executors.newFixedThreadPool(10);

    public static void index() {
        Future future = pool.submit(() -> {
            try {
                Logger.info("before sleep");
                Thread.sleep(20000);
                Logger.info("after sleep");
            } catch (InterruptedException e) {
                Logger.error(e.getMessage());
            }
        });
        Logger.info("before await");
        await(future);
        Logger.info("after await");
        render();
    }

}

Then add conf/log4j.properties,

log4j.rootLogger=DEBUG, Console
log4j.appender.Console=org.apache.log4j.ConsoleAppender
log4j.appender.Console.layout=org.apache.log4j.PatternLayout
log4j.appender.Console.layout.ConversionPattern=%d{ABSOLUTE} %-5p ~ [%t] %m%n

start play

play run

and simultaneously open 3 browser tabs at http://localhost:9000/

The results are:

  • The first tab loads after (around) 20 seconds
  • The second and third tabs load after (around) 40 seconds

Logs show that:

  • Pool thread pool-5-thread-1 handles the first tab request
  • Only when pool-5-thread-1 completes, pool threads pool-5-thread-2 and pool-5-thread-3 start handling the second and third tab requests
  • pool-5-thread-2 and pool-5-thread-3 complete at (around) the same time
  • play-thread-1 blocks

Expected Behavior

Since the 3 tabs were opened at (around) the same time, they should all be loaded after (around) 20 seconds i.e. the 3 pool threads should start/complete handling them at (around) the same time. Furthermore, play-thread-1 should not be blocked.

Actual Behavior

All pool threads should start/complete handling simultaneous requests at (around) the same time and play-thread-1 should not block i.e. it should handle other requests while waiting for pool threads to complete.

Reproducible Test Case

The actual behaviour is consistently reproducible by following the above-mentioned steps.

sant0s avatar Apr 01 '19 16:04 sant0s

If true, this could defeat the purpose of await somewhat. That said, the 2nd and 3rd requests are apparently running concurrently; perhaps the first web request handling thread is incorrectly doing dual duty as both the request thread dispatching and as a request thread handler itself. If so, that could lead to every one request in N requests blocking all subsequent requests until it's complete.

I'll dig into the code to investigate further. I might be guessing the completely wrong thing.

LouisJackman avatar Apr 23 '19 10:04 LouisJackman

Any luck?

sant0s avatar Jul 09 '19 08:07 sant0s

I'm unable to reproduce the issue with

  • Oracle JDK 1.8.0_91
  • AdoptOpenJDK 1.8.0_192
  • AdoptOpenJDK 1.8.0_202
  • Ubuntu 18.04 shipped OpenJDK 1.8.0_212

Am I doing something wrong? I'm opening "simultaneously" that 3 tabs by quickly middle clicking on the link in this issue three times.

~        _            _ 
~  _ __ | | __ _ _  _| |
~ | '_ \| |/ _' | || |_|
~ |  __/|_|\____|\__ (_)
~ |_|            |__/   
~
~ play! 1.5.2, https://www.playframework.com
~
~
~ Warning: conflict on command doc:export
~
~ Ctrl+C to stop
~ 
~ using java version "1.8.0_212"
Listening for transport dt_socket at address: 8000
.
.
.
12:38:52,075 INFO  ~ [play-thread-1] Application 'play-issue-1304-blocked-pool-threads' is now started !
12:38:52,129 INFO  ~ [play-thread-1] before await
12:38:52,129 INFO  ~ [pool-5-thread-1] before sleep
12:38:52,129 DEBUG ~ [play-thread-1] suspend()
12:38:52,130 DEBUG ~ [play-thread-1] push object java.util.concurrent.FutureTask@1037802910/unknown [java.util.concurrent.FutureTask@3ddb9d9e] i[0],l[0],d[0],f[0],o[0],r[0]
12:38:52,130 DEBUG ~ [play-thread-1] push int 2 i[0],l[0],d[0],f[0],o[1],r[0]
12:38:52,133 WARN  ~ [play-thread-1] Start WaitForTasksCompletion
12:38:52,135 INFO  ~ [play-thread-1] before await
12:38:52,135 INFO  ~ [pool-5-thread-2] before sleep
12:38:52,135 DEBUG ~ [play-thread-1] suspend()
12:38:52,135 DEBUG ~ [play-thread-1] push object java.util.concurrent.FutureTask@1190650721/unknown [java.util.concurrent.FutureTask@46f7e361] i[0],l[0],d[0],f[0],o[0],r[0]
12:38:52,135 DEBUG ~ [play-thread-1] push int 2 i[0],l[0],d[0],f[0],o[1],r[0]
12:38:52,216 INFO  ~ [play-thread-1] before await
12:38:52,216 INFO  ~ [pool-5-thread-3] before sleep
12:38:52,216 DEBUG ~ [play-thread-1] suspend()
12:38:52,216 DEBUG ~ [play-thread-1] push object java.util.concurrent.FutureTask@114452413/unknown [java.util.concurrent.FutureTask@6d267bd] i[0],l[0],d[0],f[0],o[0],r[0]
12:38:52,216 DEBUG ~ [play-thread-1] push int 2 i[0],l[0],d[0],f[0],o[1],r[0]

12:39:12,129 INFO  ~ [pool-5-thread-1] after sleep
12:39:12,135 INFO  ~ [pool-5-thread-2] after sleep
12:39:12,163 DEBUG ~ [play-thread-1] pop int 2 i[0],l[0],d[0],f[0],o[1],r[0]
12:39:12,163 DEBUG ~ [play-thread-1] pop object java.util.concurrent.FutureTask@1037802910/unknown [java.util.concurrent.FutureTask@3ddb9d9e] 
12:39:12,171 INFO  ~ [play-thread-1] after await
12:39:12,216 INFO  ~ [pool-5-thread-3] after sleep
12:39:12,518 DEBUG ~ [play-thread-1] pop int 2 i[0],l[0],d[0],f[0],o[1],r[0]
12:39:12,518 DEBUG ~ [play-thread-1] pop object java.util.concurrent.FutureTask@1190650721/unknown [java.util.concurrent.FutureTask@46f7e361] 
12:39:12,518 INFO  ~ [play-thread-1] after await
12:39:12,526 DEBUG ~ [play-thread-1] pop int 2 i[0],l[0],d[0],f[0],o[1],r[0]
12:39:12,526 DEBUG ~ [play-thread-1] pop object java.util.concurrent.FutureTask@114452413/unknown [java.util.concurrent.FutureTask@6d267bd] 
12:39:12,526 INFO  ~ [play-thread-1] after await

The empty line is where the 20 seconds wait happens.

xabolcs avatar Jul 12 '19 10:07 xabolcs

@xabolcs You're doing nothing wrong and I am glad that Play works as expected, as this would be a major issue (literally a "blocker" as threads get blocked :)).

Your set up (Java and Play versions) is similar to mine, so I started wondering what else could be different. I re-tested and got the following results:

Scenario A: same behaviour as I reported i.e. it doesn't work Scenario B: same behaviour as you reported i.e. it works

Can you guess what's the difference? :)

Scenario A corresponds to a VirtualBox guest, while Scenario B corresponds to a plain non-virtualised machine (as I assume you tested on).

Any idea about the underlying reason for this mismatching behaviour? It might be related to thread scheduling, but no clue on details.

So these are good news for Play apps deployed in physical machines, but an issue for those deployed in virtualised servers (in production) as they may stall.

sant0s avatar Jul 16 '19 18:07 sant0s

Nice findings, @sant0s ! Yes, I was testing on bare metal hardware.

How many processors does have that VirtualBox guest? In other words: is the bug related to processor count or to virtualization?

xabolcs avatar Jul 17 '19 08:07 xabolcs

Sorry, I didn't find the time to look too deeply into this as I originally promised. I was having difficulty reproducing, it's good to see I'm not the only one.

I can't see how more CPUs on a physical machine over a VM would change the problem (although, like most concurrency bugs, it can certainly exacerbate it.) This bug seems to be more related to concurrency and not parallelism.

If the pool is large and there are only three tabs open, is it possible that Play just doesn't allocate the main web request dispatch thread by sheer fluke? That seems pretty unlikely if you've refreshed the tabs enough though, so it seems more likely my guess is completely wrong...

LouisJackman avatar Jul 17 '19 15:07 LouisJackman

@xabolcs 2 CPUs. Both scenarios - A (VirtualBox guest) and B (host, bare metal) - are run under 2 CPUs each (the host has 4 CPUs). I'd guess virtualisation related.

@LouisJackman No problem.

I wonder if Play 2 is affected by this issue.

sant0s avatar Jul 18 '19 07:07 sant0s