rain icon indicating copy to clipboard operation
rain copied to clipboard

Rare server crash (parallel inter-task dependencies + other conditions)

Open gavento opened this issue 7 years ago • 1 comments

Rain server panics while a task becomes redy here. The relevant part of the log seems to be the following:

...
DEBUG 2018-03-17T15:31:49Z: librain::server::scheduler: Scheduler: New ready task (1,23092)
... [many New ready task info lines, various IDs]
DEBUG 2018-03-17T15:31:49Z: librain::server::scheduler: Scheduler: New ready task (1,23092)
thread 'main' panicked at 'assertion failed: r', src/server/scheduler.rs:148:17
stack backtrace:
   0: std::sys::unix::backtrace::tracing::imp::unwind_backtrace
DEBUG 2018-03-17T15:31:49Z: tokio_reactor: loop process - 1 events, 0.000s
             at libstd/sys/unix/backtrace/tracing/gcc_s.rs:49
   1: std::sys_common::backtrace::_print
             at libstd/sys_common/backtrace.rs:71
   2: std::panicking::default_hook::{{closure}}
             at libstd/sys_common/backtrace.rs:59
             at libstd/panicking.rs:207
   3: std::panicking::default_hook
             at libstd/panicking.rs:223
   4: std::panicking::rust_panic_with_hook
             at libstd/panicking.rs:402
   5: std::panicking::begin_panic
   6: librain::server::scheduler::ReactiveScheduler::schedule
   7: librain::server::state::State::run_scheduler
   8: librain::server::state::<impl librain::common::wrapped::WrappedRcRefCell<librain::server::state::State>>::turn
   9: rain::main
  10: std::rt::lang_start::{{closure}}
  11: std::panicking::try::do_call
             at libstd/rt.rs:59
             at libstd/panicking.rs:306
  12: __rust_maybe_catch_panic
             at libpanic_unwind/lib.rs:102
  13: std::rt::lang_start_internal
             at libstd/panicking.rs:285
             at libstd/panic.rs:361
             at libstd/rt.rs:58
  14: main
  15: __libc_start_main
  16: _start
DEBUG 2018-03-17T15:31:49Z: tokio_reactor: loop process - 1 events, 0.000s
DEBUG 2018-03-17T15:31:49Z: tokio_reactor::background: shutting background reactor down NOW
...

However, a small test for multiple identical inputs passes, even with subsequent submits. The benchmark only fails with >500 tasks per layer. See the benchmark attached. It was run as python3 scalebench.py net -l 256 -w 1024 -s 0, the error happens around layer 10. The debug checks with RAIN_DEBUG_MODE=1 do not find any consistency problems.

scalebench.py.txt

gavento avatar Mar 17 '18 16:03 gavento

This seems difficult to reproduce - the duplicate dependency itself is not a problem, just the trigger under heavy load, and even there, @spirali could not reproduce it.

gavento avatar Apr 12 '18 14:04 gavento