dubbo HashedWheelTimer worker thread might have accumulative delay problem

Environment

Dubbo version: 3.0 / master

waitForNextTick想法有点多

                final long currentTime = System.nanoTime() - startTime; 
<== startTime是初始化HashedWheelTimer的时间，currentTime是时间差；
应用程序持续运行时间只要不超过262年，currentTime这个值都不会小于0。
即使System.nanoTime()返回的当前时间超过LONG最大值变成了负数，
它和startTime之间的纳秒时间差只要不超过2^64次方-1，currentTime就不会变成负值。
                long sleepTimeMs = (deadline - currentTime + 999999) / 1000000;

                if (sleepTimeMs <= 0) {
                    if (currentTime == Long.MIN_VALUE) {  
                        <==  Long.MIN_VALUE是-2^64，
程序要持续运行262年之后才有遇到它的机会；返回负值也有问题，调用waitForNextTick的代码只认返回值大于0的情况。
                        return -Long.MAX_VALUE;  
                    } else {
                        return currentTime;
                    }
                }

                if (isWindows()) {
                    sleepTimeMs = sleepTimeMs / 10 * 10;
                   <== 看起来是想让windows至少等10ms，但是sleepTimeMs不足10ms时变成0也不合适吧
                }

                try {
                    Thread.sleep(sleepTimeMs);
                }

HashedWheelTimer的maxPendingTimeouts可以任意设置，但实测sleep时间到了也抢不到cpu，比如： HashedWheelTimerTest的createTaskTest测试方法在4核windows机器上跑单元测试，多半失败：

        Thread.sleep(100); 《== 实测发现执行expireTimeouts的线程很少能在1秒内抢到cpu，所以这里的100ms是想当然了
        Assertions.assertTrue(timeout.isExpired());

        timer.stop();

更新：抢不到cpu是因为执行timeout任务BlockTask的线程是worker线程，BlockTask的代码：

    private static class BlockTask implements TimerTask {
        @Override
        public void run(Timeout timeout) throws InterruptedException {
            System.out.println("thread:" + Thread.currentThread() + " timeout:" + timeout); <== 这个是后来加的
            this.wait(); 《== 把执行worker的线程阻塞，其他timeout就得不到及时处理了
        }
    }

HashedWheelTimerTest.java断言timeout数量超maxPendingTimeouts有时也失败

Assertions.assertThrows(RuntimeException.class,
                () -> timer.newTimeout(new BlockTask(), 1, TimeUnit.MILLISECONDS));

原因是前面创建的timeout的delay是-1，马上就过期了。

Sep 29 '21 06:09 zrlw

HashedWheelTimer 是参考 netty 实现的，上面有的问题在 netty 的最新版本里面已经修复了，是不是照着 netty 的修改过来就好。 https://github.com/netty/netty/blob/4.1/common/src/main/java/io/netty/util/HashedWheelTimer.java

Sep 30 '21 01:09 AlbumenJ

我看了netty的HashedWheelTimer代码并没有修复这个问题。 worker线程执行while循环时，如果有超时任务则直接用worker线程自己去执行，如果执行任务挂住了，那么worker线程也会挂住不动，后面的循环都会一直等着，这样超时机制就会错过正常时间点了。

给netty提的issue： https://github.com/netty/netty/issues/11724

Sep 30 '21 04:09 zrlw

参照netty的做法，sleepTimeMs为0时改为等待1毫秒了。

Sep 30 '21 04:09 zrlw

还要单独创建线程池执行timeout任务，如果复用worker的threadFactory，当池子容量很小，timeout任务很多，任务就会积压，导致timeout任务处理时间严重滞后。

Sep 30 '21 07:09 zrlw

考虑再三还是选了newFixedThreadPool来执行timer task，相比cached线程池，fixed线程池消耗的资源相对较少。只是fixed线程池本身有无限队列的问题，但是如果把等待队列设为有界，我不知道设置多大合适，还有超出队列容量的任务要怎么处理。即使这样修改，worker thread能够不受task任务影响持续正常运转，但是如果timeout task任务本身代码效率低、执行慢，当排队等待执行的timeout task数量超过newFixedThreadPool容量时，依旧会出超时任务执行时间滞后的问题，这种情况只修改HashedWheelTimer是搞不定的。

Oct 03 '21 03:10 zrlw

event loop和event process的处理线程还是分离好一些，或者像apache httpclient5的FutureRequestExecutionService的构造函数那样，要求用户自己提供一个ExecutorService来执行异步task。

Oct 04 '21 06:10 zrlw

apache httpclient5的FutureRequestExecutionService采取的方式是构造函数提供一个ExecutorService入参，由用户负责创建并用来执行异步任务。

Oct 04 '21 06:10 zrlw

有位netty member提交了一个PR，HashedWheelTimer类增加一个带Executor入参的构造函数。 https://github.com/netty/netty/pull/11728 虽然有netty member认为并不需要改，但该PR已merged。

更新：参照netty的PR 11728重新做了修订，不同之处是默认的executor改成了static fixedThreadpool全局共享线程池。

Oct 04 '21 07:10 zrlw

请问这个问题已经解决了吗？为啥这个issue还在保持open状态呢？

Dec 27 '23 01:12 xuwenyu2018

请问这个问题已经解决了吗？为啥这个issue还在保持open状态呢？

具体原因要问问 @AlbumenJ @chickenlj

Dec 28 '23 06:12 zrlw