gossamer icon indicating copy to clipboard operation
gossamer copied to clipboard

CRIT epoch not found in memory map

Open kishansagathiya opened this issue 3 years ago • 7 comments

Describe the bug

2022-07-18T13:10:35+05:30 CRIT failed to run block production engine: cannot handle epoch: cannot initiate and get epoch handler: failed to initiate epoch: cannot get epoch data and start slot: cannot get epoch data for epoch 2: failed to get epoch data from memory: epoch not found in memory map: 2	babe.go:L395	pkg=babe

How to reproduce?

  • Run two substrate nodes (using substrate node template, latest runtime) and one gossamer node.
  • Let it run for a while, say 40 blocks or so.
  • There is a chance you might see the above mentioned error.
  • If you don't the error. Stop all nodes and restart them and wait for a bit. You would see the error now.

kishansagathiya avatar Jul 18 '22 08:07 kishansagathiya

This may be related to #2627

timwu20 avatar Jul 21 '22 15:07 timwu20

Seeing this as well

2022-09-29T18:48:35+05:30 CRITICAL failed to run block production engine: cannot handle epoch: cannot initiate and get epoch handler: failed to initiate epoch: cannot get epoch data and start slot: cannot get authority index: key not in BABE authority data	babe.go:L366	pkg=babe

and a lot of

2022-09-29T18:48:45+05:30 ERROR    digest type not supported: types.PreRuntimeDigest	digest.go:L113	pkg=digest
2022-09-29T18:48:45+05:30 ERROR    digest type not supported: types.SealDigest	digest.go:L113	pkg=digest

kishansagathiya avatar Sep 29 '22 13:09 kishansagathiya

What works

  • dev
  • dev-staking
  • dev-v3substrate
  • gssmr
  • gssmr-staking
  • gssmr-v3substrate

Doesn't work

  • Kusama
  • Polkadot
  • Westend and Westend Local fails with
2022-10-05T18:43:40+05:30 CRITICAL target=runtime message=panicked at 'Bitfields and heads must be included every block', /builds/runtime/parachains/src/paras_inherent/mod.rs:200:17	imports.go:L140:ext_logging_log_version_1	pkg=runtime module=go-wasmer
2022-10-05T18:43:40+05:30 WARN     failed to handle slot 277495936: cannot finalise block: running runtime function: Failed to call the `BlockBuilder_finalize_block` exported function.	epoch_handler.go:L140	pkg=babe

Solution to this is parachain inherent support.

kishansagathiya avatar Oct 05 '22 13:10 kishansagathiya

after parachain inherent merge, westend dev and local works fine as well.

kishansagathiya avatar Oct 06 '22 11:10 kishansagathiya

When trying two substrate nodes and one gossamer node a lot of failure to close outbound stream

WARN     failed to close outbound stream: stream reset	notifications.go:L247	pkg=network

kishansagathiya avatar Oct 06 '22 14:10 kishansagathiya

@kishansagathiya I would suggest creating another issue to describe with more context the problem you commented on here as it looks not related with this issue CRIT epoch not found in memory map

When trying two substrate nodes and one gossamer node a lot of failure to close outbound stream

WARN     failed to close outbound stream: stream reset	notifications.go:L247	pkg=network

EclesioMeloJunior avatar Oct 06 '22 15:10 EclesioMeloJunior

Few more errors occuring while running this network On substrate side I am seeing

2022-10-07 18:37:12 Error with block built on 0xa136b09c27298d28c89134c1ae22a50d43d1f063982a1e21f44ac7e3bc7cc9b8: Import failed: Unexpected epoch change    

and this on gossamer side

2022-10-07T18:33:00+05:30 CRITICAL failed to run block production engine: cannot handle epoch: cannot initiate and get epoch handler: failed to initiate epoch: cannot check and set first slot: cannot get block with number 1: failed to get hash from blocktree: cannot find node with number greater than highest in blocktree	babe.go:L362	pkg=babe
2022-10-07T18:33:01+05:30 ERROR    block data processing for block with hash 0xa136b09c27298d28c89134c1ae22a50d43d1f063982a1e21f44ac7e3bc7cc9b8 failed: failed to get verifier info for block 64: failed to get epoch data for epoch 10: failed to get epoch data from memory: epoch not found in memory map: 10	chain_processor.go:L80	pkg=sync

kishansagathiya avatar Oct 07 '22 13:10 kishansagathiya

Merging https://github.com/ChainSafe/gossamer/pull/2709 has improved the situation.

Looks like now, we can run cross-client devnet for much longer periods.

  • I ran a network of 2 substrate nodes and one gossamer node. The network ran without any error and built 125+ blocks until I interrupted by stopping gossamer node.
  • I let 2 substrate node build block till 250+ block number, while I kept gossamer shut down. On restarting gossamer, it was quickly able to sync and continue with the network without any error.
  • I shut down all nodes and restarted them again to see the error below
2022-11-02T19:11:00+05:30 ERROR    block data processing for block with hash 0x2f12aeb0513cdc18fd9ca5b173214a0aff8893927d42c5bc62a86c5433932220 failed: failed to get verifier info for block 256: failed to get epoch data for epoch 8: failed to get epoch data from memory: epoch not found in memory map: 8	chain_processor.go:L80	pkg=sync

Having epoch data memory error on shutting down all nodes is as per specification. Every epoch should have at least one block. If there are empty epochs further block development gets halted. On shutting down all nodes we are creating empty epochs.

There is an issue to allow empty epochs on substrate https://github.com/paritytech/substrate/issues/11393 and some discussion https://github.com/paritytech/substrate/discussions/10209

I think we can close this issue.

kishansagathiya avatar Nov 02 '22 13:11 kishansagathiya

Considering above comment in mind, closing this as per our discussion. @timwu20 @danforbes

kishansagathiya avatar Nov 03 '22 08:11 kishansagathiya