Dagger.jl icon indicating copy to clipboard operation
Dagger.jl copied to clipboard

Slowdown in scheduler after extended eager api usage

Open krynju opened this issue 4 years ago • 2 comments

This piece of code (if it doesn't hang) slows down significantly at ~1800 iterations for me. From observing the threads usage it seems like the main thread is constantly occupied and the time period between each spawn increases every iteration (it can be observed through cpu usage on threads).

Most likely related to the huge number of thunks generated in this code over time

julia> using Dagger

julia> c = Dagger.@spawn 10+10; b = (x) -> x + 10; a = (x) -> x .+ fetch.([Dagger.spawn(b, x + i) for i in 1:100]);

julia> for i in 1:10000
       r = Dagger.@spawn a(c)
       fetch(r); println(i)
       end

krynju avatar Sep 16 '21 19:09 krynju

Can you try running the Julia profiler with Julia master on this, at least twice before the slowdown and twice after? We should see a noticeable slowdown during some part of the scheduler, which is probably traversing over some ever-growing list or other data structure.

jpsamaroo avatar Sep 17 '21 00:09 jpsamaroo

Clue for later: Seems like pure distributed stress tests also suffer from slowdown after a while, might not be a Dagger issue after all, but I'll look into that at some point.

krynju avatar Sep 30 '21 06:09 krynju