materialize icon indicating copy to clipboard operation
materialize copied to clipboard

persist: panic when accessing a reader that does not exist

Open pH14 opened this issue 3 years ago • 6 comments

What version of Materialize are you using?

v0.27.0-alpha.3

How did you install Materialize?

Materialize Cloud

What is the issue?

Brennan found a reproducible error on his cloud environment, indicating we're trying to get the Reader associated with an invalid/expired ReaderId. I haven't dug into the stack trace, but off hand the only way for ReaderIds to be removed from state right now is through ReadHandle::Drop or an explicit expire call, so we must be hanging on to one for too long.

Relevant log output

2022-07-22T18:28:21.217590Z  INFO mz_service::grpc: GrpcClient compute-cluster-1-replica-1-0.compute-cluster-1-replica-1.environment-b2f1d2b8-16e8-40fc-9872-a8f32f8cbed0-0.svc.cluster.local:2100: connected
thread 'tokio:work-0' panicked at 'TODO: Implement automatic lease renewals', /var/lib/buildkite-agent/builds/buildkite-builders-d43b1b5-i-05861050ebb44e9e9-1/materialize/tests/src/persist-client/src/impl/state.rs:224:14
stack backtrace:
   0: rust_begin_unwind
             at ./rustc/a8314ef7d0ec7b75c336af2c9857bfaf43002bfc/library/std/src/panicking.rs:584:5
   1: core::panicking::panic_fmt
             at ./rustc/a8314ef7d0ec7b75c336af2c9857bfaf43002bfc/library/core/src/panicking.rs:142:14
   2: core::panicking::panic_display
             at ./rustc/a8314ef7d0ec7b75c336af2c9857bfaf43002bfc/library/core/src/panicking.rs:72:5
   3: core::panicking::panic_str
             at ./rustc/a8314ef7d0ec7b75c336af2c9857bfaf43002bfc/library/core/src/panicking.rs:56:5
   4: core::option::expect_failed
             at ./rustc/a8314ef7d0ec7b75c336af2c9857bfaf43002bfc/library/core/src/option.rs:1854:5
   5: core::option::Option<T>::expect
             at ./rustc/a8314ef7d0ec7b75c336af2c9857bfaf43002bfc/library/core/src/option.rs:718:21
   6: mz_persist_client::impl::state::StateCollections<T>::reader
             at ./var/lib/buildkite-agent/builds/buildkite-builders-d43b1b5-i-05861050ebb44e9e9-1/materialize/tests/src/persist-client/src/impl/state.rs:218:9
   7: mz_persist_client::impl::state::StateCollections<T>::heartbeat_reader
             at ./var/lib/buildkite-agent/builds/buildkite-builders-d43b1b5-i-05861050ebb44e9e9-1/materialize/tests/src/persist-client/src/impl/state.rs:195:28
   8: mz_persist_client::impl::machine::Machine<K,V,T,D>::heartbeat_reader::{{closure}}::{{closure}}
             at ./var/lib/buildkite-agent/builds/buildkite-builders-d43b1b5-i-05861050ebb44e9e9-1/materialize/tests/src/persist-client/src/impl/machine.rs:227:17
   9: core::ops::function::impls::<impl core::ops::function::FnMut<A> for &mut F>::call_mut
             at ./rustc/a8314ef7d0ec7b75c336af2c9857bfaf43002bfc/library/core/src/ops/function.rs:290:13
  10: mz_persist_client::impl::state::State<K,V,T,D>::clone_apply
             at ./var/lib/buildkite-agent/builds/buildkite-builders-d43b1b5-i-05861050ebb44e9e9-1/materialize/tests/src/persist-client/src/impl/state.rs:324:24
  11: mz_persist_client::impl::machine::Machine<K,V,T,D>::apply_unbatched_cmd::{{closure}}::{{closure}}::{{closure}}
             at ./var/lib/buildkite-agent/builds/buildkite-builders-d43b1b5-i-05861050ebb44e9e9-1/materialize/tests/src/persist-client/src/impl/machine.rs:395:51
  12: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
             at ./rustc/a8314ef7d0ec7b75c336af2c9857bfaf43002bfc/library/core/src/future/mod.rs:91:19
  13: mz_persist_client::impl::metrics::CmdMetrics::run_cmd::{{closure}}
             at ./var/lib/buildkite-agent/builds/buildkite-builders-d43b1b5-i-05861050ebb44e9e9-1/materialize/tests/src/persist-client/src/impl/metrics.rs:336:74
  14: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
             at ./rustc/a8314ef7d0ec7b75c336af2c9857bfaf43002bfc/library/core/src/future/mod.rs:91:19
  15: mz_persist_client::impl::machine::Machine<K,V,T,D>::apply_unbatched_cmd::{{closure}}
             at ./var/lib/buildkite-agent/builds/buildkite-builders-d43b1b5-i-05861050ebb44e9e9-1/materialize/tests/src/persist-client/src/impl/machine.rs:475:11
  16: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
             at ./rustc/a8314ef7d0ec7b75c336af2c9857bfaf43002bfc/library/core/src/future/mod.rs:91:19
  17: mz_persist_client::impl::machine::Machine<K,V,T,D>::apply_unbatched_idempotent_cmd::{{closure}}
             at ./var/lib/buildkite-agent/builds/buildkite-builders-d43b1b5-i-05861050ebb44e9e9-1/materialize/tests/src/persist-client/src/impl/machine.rs:364:62
  18: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
             at ./rustc/a8314ef7d0ec7b75c336af2c9857bfaf43002bfc/library/core/src/future/mod.rs:91:19
  19: mz_persist_client::impl::machine::Machine<K,V,T,D>::heartbeat_reader::{{closure}}
             at ./var/lib/buildkite-agent/builds/buildkite-builders-d43b1b5-i-05861050ebb44e9e9-1/materialize/tests/src/persist-client/src/impl/machine.rs:228:15
  20: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
             at ./rustc/a8314ef7d0ec7b75c336af2c9857bfaf43002bfc/library/core/src/future/mod.rs:91:19
  21: mz_persist_client::read::ReadHandle<K,V,T,D>::maybe_heartbeat_reader::{{closure}}
             at ./var/lib/buildkite-agent/builds/buildkite-builders-d43b1b5-i-05861050ebb44e9e9-1/materialize/tests/src/persist-client/src/read.rs:734:69
  22: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
             at ./rustc/a8314ef7d0ec7b75c336af2c9857bfaf43002bfc/library/core/src/future/mod.rs:91:19
  23: mz_persist_client::read::SnapshotIter<K,V,T,D>::next::{{closure}}::{{closure}}
             at ./var/lib/buildkite-agent/builds/buildkite-builders-d43b1b5-i-05861050ebb44e9e9-1/materialize/tests/src/persist-client/src/read.rs:153:57
  24: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
             at ./rustc/a8314ef7d0ec7b75c336af2c9857bfaf43002bfc/library/core/src/future/mod.rs:91:19
  25: mz_persist_client::read::SnapshotIter<K,V,T,D>::next::{{closure}}
             at ./var/lib/buildkite-agent/builds/buildkite-builders-d43b1b5-i-05861050ebb44e9e9-1/materialize/tests/src/persist-client/src/read.rs:144:5
  26: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
             at ./rustc/a8314ef7d0ec7b75c336af2c9857bfaf43002bfc/library/core/src/future/mod.rs:91:19
  27: mz_storage::controller::persist_read_downgrade::PersistWorker<T>::new::{{closure}}::snapshot::{{closure}}
             at ./var/lib/buildkite-agent/builds/buildkite-builders-d43b1b5-i-05861050ebb44e9e9-1/materialize/tests/src/storage/src/controller.rs:1066:78
  28: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
             at ./rustc/a8314ef7d0ec7b75c336af2c9857bfaf43002bfc/library/core/src/future/mod.rs:91:19
  29: mz_storage::controller::persist_read_downgrade::PersistWorker<T>::new::{{closure}}
             at ./var/lib/buildkite-agent/builds/buildkite-builders-d43b1b5-i-05861050ebb44e9e9-1/materialize/tests/src/storage/src/controller.rs:1078:82
  30: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
             at ./rustc/a8314ef7d0ec7b75c336af2c9857bfaf43002bfc/library/core/src/future/mod.rs:91:19
  31: tokio::runtime::task::core::CoreStage<T>::poll::{{closure}}
             at ./cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.19.2/src/runtime/task/core.rs:165:17
  32: tokio::loom::std::unsafe_cell::UnsafeCell<T>::with_mut
             at ./cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.19.2/src/loom/std/unsafe_cell.rs:14:9
  33: tokio::runtime::task::core::CoreStage<T>::poll
             at ./cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.19.2/src/runtime/task/core.rs:155:13
  34: tokio::runtime::task::harness::poll_future::{{closure}}
             at ./cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.19.2/src/runtime/task/harness.rs:470:19
  35: <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once
             at ./rustc/a8314ef7d0ec7b75c336af2c9857bfaf43002bfc/library/core/src/panic/unwind_safe.rs:271:9
  36: std::panicking::try::do_call
             at ./rustc/a8314ef7d0ec7b75c336af2c9857bfaf43002bfc/library/std/src/panicking.rs:492:40
  37: std::panicking::try
             at ./rustc/a8314ef7d0ec7b75c336af2c9857bfaf43002bfc/library/std/src/panicking.rs:456:19
  38: std::panic::catch_unwind
             at ./rustc/a8314ef7d0ec7b75c336af2c9857bfaf43002bfc/library/std/src/panic.rs:137:14
  39: tokio::runtime::task::harness::poll_future
             at ./cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.19.2/src/runtime/task/harness.rs:458:18
  40: tokio::runtime::task::harness::Harness<T,S>::poll_inner
             at ./cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.19.2/src/runtime/task/harness.rs:104:27
  41: tokio::runtime::task::harness::Harness<T,S>::poll
             at ./cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.19.2/src/runtime/task/harness.rs:57:15
  42: tokio::runtime::task::raw::RawTask::poll
             at ./cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.19.2/src/runtime/task/raw.rs:84:18
  43: tokio::runtime::task::LocalNotified<S>::run
             at ./cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.19.2/src/runtime/task/mod.rs:376:9
  44: tokio::runtime::thread_pool::worker::Context::run_task::{{closure}}
             at ./cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.19.2/src/runtime/thread_pool/worker.rs:458:21
  45: tokio::coop::with_budget::{{closure}}
             at ./cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.19.2/src/coop.rs:102:9
  46: std::thread::local::LocalKey<T>::try_with
             at ./rustc/a8314ef7d0ec7b75c336af2c9857bfaf43002bfc/library/std/src/thread/local.rs:445:16
  47: std::thread::local::LocalKey<T>::with
             at ./rustc/a8314ef7d0ec7b75c336af2c9857bfaf43002bfc/library/std/src/thread/local.rs:421:9
  48: tokio::coop::with_budget
             at ./cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.19.2/src/coop.rs:95:5
  49: tokio::coop::budget
             at ./cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.19.2/src/coop.rs:72:5
  50: tokio::runtime::thread_pool::worker::Context::run_task
             at ./cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.19.2/src/runtime/thread_pool/worker.rs:434:9
  51: tokio::runtime::thread_pool::worker::Context::run
  52: tokio::runtime::thread_pool::worker::run::{{closure}}
             at ./cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.19.2/src/runtime/thread_pool/worker.rs:386:17
  53: tokio::macros::scoped_tls::ScopedKey<T>::set
             at ./cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.19.2/src/macros/scoped_tls.rs:61:9
  54: tokio::runtime::thread_pool::worker::run
             at ./cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.19.2/src/runtime/thread_pool/worker.rs:383:5
  55: tokio::runtime::thread_pool::worker::Launch::launch::{{closure}}
             at ./cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.19.2/src/runtime/thread_pool/worker.rs:362:45
  56: <tokio::runtime::blocking::task::BlockingTask<T> as core::future::future::Future>::poll
             at ./cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.19.2/src/runtime/blocking/task.rs:42:21
  57: tokio::runtime::task::core::CoreStage<T>::poll::{{closure}}
             at ./cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.19.2/src/runtime/task/core.rs:165:17
  58: tokio::loom::std::unsafe_cell::UnsafeCell<T>::with_mut
             at ./cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.19.2/src/loom/std/unsafe_cell.rs:14:9
  59: tokio::runtime::task::core::CoreStage<T>::poll
             at ./cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.19.2/src/runtime/task/core.rs:155:13
  60: tokio::runtime::task::harness::poll_future::{{closure}}
             at ./cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.19.2/src/runtime/task/harness.rs:470:19
  61: <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once
             at ./rustc/a8314ef7d0ec7b75c336af2c9857bfaf43002bfc/library/core/src/panic/unwind_safe.rs:271:9
  62: std::panicking::try::do_call
             at ./rustc/a8314ef7d0ec7b75c336af2c9857bfaf43002bfc/library/std/src/panicking.rs:492:40
  63: std::panicking::try
             at ./rustc/a8314ef7d0ec7b75c336af2c9857bfaf43002bfc/library/std/src/panicking.rs:456:19
  64: std::panic::catch_unwind
             at ./rustc/a8314ef7d0ec7b75c336af2c9857bfaf43002bfc/library/std/src/panic.rs:137:14
  65: tokio::runtime::task::harness::poll_future
             at ./cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.19.2/src/runtime/task/harness.rs:458:18
  66: tokio::runtime::task::harness::Harness<T,S>::poll_inner
             at ./cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.19.2/src/runtime/task/harness.rs:104:27
  67: tokio::runtime::task::harness::Harness<T,S>::poll
             at ./cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.19.2/src/runtime/task/harness.rs:57:15
  68: tokio::runtime::task::raw::RawTask::poll
             at ./cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.19.2/src/runtime/task/raw.rs:84:18
  69: tokio::runtime::task::UnownedTask<S>::run
             at ./cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.19.2/src/runtime/task/mod.rs:413:9
  70: tokio::runtime::blocking::pool::Task::run
             at ./cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.19.2/src/runtime/blocking/pool.rs:91:9
  71: tokio::runtime::blocking::pool::Inner::run
             at ./cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.19.2/src/runtime/blocking/pool.rs:325:17
  72: tokio::runtime::blocking::pool::Spawner::spawn_thread::{{closure}}
             at ./cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.19.2/src/runtime/blocking/pool.rs:300:13
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

pH14 avatar Jul 22 '22 20:07 pH14

@pH14 I think maybe we want to put a log message when we force expire a read or write lease (including shard_id and reader_id/writer_id). that's rare and notable enough (except for a batch 15m after process restart) that I think the volume would be okay. then we should include shard_id and reader_id/writer_id in these panics :)

danhhz avatar Aug 18 '22 14:08 danhhz

Agreed, once we have a way to filter traces into HC or Jaeger, I was thinking we should always sample traces that register or expire readers and writers. In the mean time, log lines should be sufficient

pH14 avatar Aug 18 '22 15:08 pH14

Got a local repro from moritz that seems to be in the same place (the big batch of read handles in environmentd). His repro was just to run bin/environmentd --no-default-features --reset and let it sit for a while

thread 'tokio:work-15' panicked at 'TODO: Implement automatic lease renewals', /home/moritz/dev/repos/materialize/src/persist-client/src/impl/state.rs:251:14
stack backtrace:
   0: rust_begin_unwind
             at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/std/src/panicking.rs:584:5
   1: core::panicking::panic_fmt
             at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/core/src/panicking.rs:142:14
   2: core::panicking::panic_display
             at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/core/src/panicking.rs:72:5
   3: core::panicking::panic_str
             at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/core/src/panicking.rs:56:5
   4: core::option::expect_failed
             at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/core/src/option.rs:1874:5
   5: core::option::Option<T>::expect
             at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/core/src/option.rs:738:21
   6: mz_persist_client::impl::state::StateCollections<T>::reader
             at ./src/persist-client/src/impl/state.rs:245:9
   7: mz_persist_client::impl::state::StateCollections<T>::downgrade_since
             at ./src/persist-client/src/impl/state.rs:187:28
   8: mz_persist_client::impl::machine::Machine<K,V,T,D>::downgrade_since::{{closure}}::{{closure}}
             at ./src/persist-client/src/impl/machine.rs:271:13
   9: core::ops::function::impls::<impl core::ops::function::FnMut<A> for &mut F>::call_mut
             at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/core/src/ops/function.rs:290:13
  10: mz_persist_client::impl::state::State<K,V,T,D>::clone_apply
             at ./src/persist-client/src/impl/state.rs:419:24
  11: mz_persist_client::impl::machine::Machine<K,V,T,D>::apply_unbatched_cmd::{{closure}}::{{closure}}::{{closure}}
             at ./src/persist-client/src/impl/machine.rs:468:55
  12: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
             at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/core/src/future/mod.rs:91:19
  13: mz_persist_client::impl::metrics::CmdMetrics::run_cmd::{{closure}}
             at ./src/persist-client/src/impl/metrics.rs:371:74
  14: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
             at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/core/src/future/mod.rs:91:19
  15: mz_persist_client::impl::machine::Machine<K,V,T,D>::apply_unbatched_cmd::{{closure}}
             at ./src/persist-client/src/impl/machine.rs:580:11
  16: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
             at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/core/src/future/mod.rs:91:19
  17: mz_persist_client::impl::machine::Machine<K,V,T,D>::apply_unbatched_idempotent_cmd::{{closure}}
             at ./src/persist-client/src/impl/machine.rs:435:62
  18: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
             at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/core/src/future/mod.rs:91:19
  19: mz_persist_client::impl::machine::Machine<K,V,T,D>::downgrade_since::{{closure}}
             at ./src/persist-client/src/impl/machine.rs:272:11
  20: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
             at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/core/src/future/mod.rs:91:19
  21: mz_persist_client::read::ReadHandle<K,V,T,D>::downgrade_since::{{closure}}::{{closure}}
             at ./src/persist-client/src/read.rs:479:75
  22: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
             at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/core/src/future/mod.rs:91:19
  23: mz_persist_client::read::ReadHandle<K,V,T,D>::downgrade_since::{{closure}}
             at ./src/persist-client/src/read.rs:475:5
  24: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
             at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/core/src/future/mod.rs:91:19
  25: mz_persist_client::read::ReadHandle<K,V,T,D>::maybe_downgrade_since::{{closure}}
             at ./src/persist-client/src/read.rs:691:45
  26: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
             at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/core/src/future/mod.rs:91:19
  27: <tracing::instrument::Instrumented<T> as core::future::future::Future>::poll
             at /home/moritz/.cargo/registry/src/github.com-1ecc6299db9ec823/tracing-0.1.35/src/instrument.rs:272:9
  28: mz_storage::controller::persist_read_handles::PersistWorker<T>::new::{{closure}}::{{closure}}
             at ./src/storage/src/controller.rs:1544:104
  29: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
             at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/core/src/future/mod.rs:91:19
  30: <futures_util::stream::futures_unordered::FuturesUnordered<Fut> as futures_core::stream::Stream>::poll_next
             at /home/moritz/.cargo/registry/src/github.com-1ecc6299db9ec823/futures-util-0.3.21/src/stream/futures_unordered/mod.rs:514:17
  31: <futures_util::stream::stream::collect::Collect<St,C> as core::future::future::Future>::poll
             at /home/moritz/.cargo/registry/src/github.com-1ecc6299db9ec823/futures-util-0.3.21/src/stream/stream/collect.rs:50:26
  32: mz_storage::controller::persist_read_handles::PersistWorker<T>::new::{{closure}}
             at ./src/storage/src/controller.rs:1553:61
  33: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
             at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/core/src/future/mod.rs:91:19
  34: <core::pin::Pin<P> as core::future::future::Future>::poll
             at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/core/src/future/future.rs:124:9
  35: tokio::runtime::task::core::CoreStage<T>::poll::{{closure}}
             at /home/moritz/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.19.2/src/runtime/task/core.rs:165:17
  36: tokio::loom::std::unsafe_cell::UnsafeCell<T>::with_mut
             at /home/moritz/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.19.2/src/loom/std/unsafe_cell.rs:14:9
  37: tokio::runtime::task::core::CoreStage<T>::poll
             at /home/moritz/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.19.2/src/runtime/task/core.rs:155:13
  38: tokio::runtime::task::harness::poll_future::{{closure}}
             at /home/moritz/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.19.2/src/runtime/task/harness.rs:470:19
  39: <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once
             at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/core/src/panic/unwind_safe.rs:271:9
  40: std::panicking::try::do_call
             at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/std/src/panicking.rs:492:40
  41: __rust_try
  42: std::panicking::try
             at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/std/src/panicking.rs:456:19
  43: std::panic::catch_unwind
             at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/std/src/panic.rs:137:14
  44: tokio::runtime::task::harness::poll_future
             at /home/moritz/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.19.2/src/runtime/task/harness.rs:458:18
  45: tokio::runtime::task::harness::Harness<T,S>::poll_inner
             at /home/moritz/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.19.2/src/runtime/task/harness.rs:104:27
  46: tokio::runtime::task::harness::Harness<T,S>::poll
             at /home/moritz/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.19.2/src/runtime/task/harness.rs:57:15
  47: tokio::runtime::task::raw::poll
             at /home/moritz/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.19.2/src/runtime/task/raw.rs:144:5
  48: tokio::runtime::task::raw::RawTask::poll
             at /home/moritz/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.19.2/src/runtime/task/raw.rs:84:18
  49: tokio::runtime::task::LocalNotified<S>::run
             at /home/moritz/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.19.2/src/runtime/task/mod.rs:376:9
  50: tokio::runtime::thread_pool::worker::Context::run_task::{{closure}}
             at /home/moritz/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.19.2/src/runtime/thread_pool/worker.rs:435:13
  51: tokio::coop::with_budget::{{closure}}
             at /home/moritz/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.19.2/src/coop.rs:102:9
  52: std::thread::local::LocalKey<T>::try_with
             at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/std/src/thread/local.rs:445:16
  53: std::thread::local::LocalKey<T>::with
             at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/std/src/thread/local.rs:421:9
  54: tokio::coop::with_budget
             at /home/moritz/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.19.2/src/coop.rs:95:5
  55: tokio::coop::budget
             at /home/moritz/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.19.2/src/coop.rs:72:5
  56: tokio::runtime::thread_pool::worker::Context::run_task
             at /home/moritz/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.19.2/src/runtime/thread_pool/worker.rs:434:9
  57: tokio::runtime::thread_pool::worker::Context::run
             at /home/moritz/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.19.2/src/runtime/thread_pool/worker.rs:408:24
  58: tokio::runtime::thread_pool::worker::run::{{closure}}
             at /home/moritz/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.19.2/src/runtime/thread_pool/worker.rs:386:17
  59: tokio::macros::scoped_tls::ScopedKey<T>::set
             at /home/moritz/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.19.2/src/macros/scoped_tls.rs:61:9
  60: tokio::runtime::thread_pool::worker::run
             at /home/moritz/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.19.2/src/runtime/thread_pool/worker.rs:383:5
  61: tokio::runtime::thread_pool::worker::Launch::launch::{{closure}}
             at /home/moritz/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.19.2/src/runtime/thread_pool/worker.rs:362:45
  62: <tokio::runtime::blocking::task::BlockingTask<T> as core::future::future::Future>::poll
             at /home/moritz/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.19.2/src/runtime/blocking/task.rs:42:21
  63: tokio::runtime::task::core::CoreStage<T>::poll::{{closure}}
             at /home/moritz/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.19.2/src/runtime/task/core.rs:165:17
  64: tokio::loom::std::unsafe_cell::UnsafeCell<T>::with_mut
             at /home/moritz/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.19.2/src/loom/std/unsafe_cell.rs:14:9
  65: tokio::runtime::task::core::CoreStage<T>::poll
             at /home/moritz/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.19.2/src/runtime/task/core.rs:155:13
  66: tokio::runtime::task::harness::poll_future::{{closure}}
             at /home/moritz/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.19.2/src/runtime/task/harness.rs:470:19
  67: <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once
             at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/core/src/panic/unwind_safe.rs:271:9
  68: std::panicking::try::do_call
             at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/std/src/panicking.rs:492:40
  69: __rust_try
  70: std::panicking::try
             at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/std/src/panicking.rs:456:19
  71: std::panic::catch_unwind
             at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/std/src/panic.rs:137:14
  72: tokio::runtime::task::harness::poll_future
             at /home/moritz/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.19.2/src/runtime/task/harness.rs:458:18
  73: tokio::runtime::task::harness::Harness<T,S>::poll_inner
             at /home/moritz/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.19.2/src/runtime/task/harness.rs:104:27
  74: tokio::runtime::task::harness::Harness<T,S>::poll
             at /home/moritz/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.19.2/src/runtime/task/harness.rs:57:15
  75: tokio::runtime::task::raw::poll
             at /home/moritz/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.19.2/src/runtime/task/raw.rs:144:5
  76: tokio::runtime::task::raw::RawTask::poll
             at /home/moritz/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.19.2/src/runtime/task/raw.rs:84:18
  77: tokio::runtime::task::UnownedTask<S>::run
             at /home/moritz/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.19.2/src/runtime/task/mod.rs:413:9
  78: tokio::runtime::blocking::pool::Task::run
             at /home/moritz/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.19.2/src/runtime/blocking/pool.rs:91:9
  79: tokio::runtime::blocking::pool::Inner::run
             at /home/moritz/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.19.2/src/runtime/blocking/pool.rs:325:17
  80: tokio::runtime::blocking::pool::Spawner::spawn_thread::{{closure}}
             at /home/moritz/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.19.2/src/runtime/blocking/pool.rs:300:13
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
[1]    145052 IOT instruction (core dumped)  RUST_BACKTRACE=1 bin/environmentd --no-default-features --reset

danhhz avatar Aug 18 '22 17:08 danhhz

@pH14 & @danhhz Do we think this is still a thing?

aljoscha avatar Aug 31 '22 13:08 aljoscha

Paul and I are both reasonably convinced at this point that moritz's issue was due to his machine going to sleep (so no heartbeating). When it wakes up, if the first cmd to commit is anything but the read heartbeat, it would expire leases, so we're reasonably likely to hit this any time a machine goes to sleep. I don't know that we terribly care about that right now, except that we've already gotten the same question from other devs, so we might want to tweak the panic message.

However, one thought here is that persist could try to be a bit more graceful about these. Instead of a panic, we could return a structured error back up to the ReadHandle and use that to trigger an attempt for the ReadHandle to silently re-register itself under a new ReaderId. This likely wouldn't work well for a ReadHandle that is holding a capability for a snapshot or a listen, but it would be perfectly fine for the ones that the storage controller has to downgrade_since.

On a pre-launch timeline, I think we just want to stick with what we have (predicated on long leases), but this idea might be interesting to pursue post-launch.

danhhz avatar Aug 31 '22 14:08 danhhz

Given @danhhz's analysis, I'm going to move this to the backlog and we can re-visit again post-launch.

nmeagan11 avatar Aug 31 '22 16:08 nmeagan11

Replaced by https://github.com/MaterializeInc/materialize/issues/14949

danhhz avatar Sep 22 '22 20:09 danhhz