sunlight
sunlight copied to clipboard
ctlog: test recover behavior with an asynchronous slow backend
Test recovery behavior when a backend sometimes freezes for a while before writing a file.
A sequencing failure with a delayed earlier write can cause log recovery to get stuck:
- Sequencing starts.
- The sequencer starts writing tile file "A" with contents "A1" (asynchronously).
- The sequencer times out.
- Sequencing restarts.
- The sequencer tries to write tile file "A" with contents "A2" and succeeds.
- The earlier write to "A" succeeds with contents "A1" replacing "A2".
- The sequencer updates the lock to a tree with hash expecting "A" to contain "A2".
- The log is in a bad state because "A" contains "A1" but should contain "A2".
In the log this looks as follows:
testlog_test.go:79: time=2024-03-16T18:20:29.746-07:00 level=ERROR source=ctlog.go:671 msg="pool sequencing failed" old_tree_size=43 entries=1 err="couldn't upload checkpoint to object storage: context deadline exceeded"
testlog_test.go:79: time=2024-03-16T18:20:29.746-07:00 level=WARN source=ctlog.go:213 msg="checkpoint in object storage is older than lock checkpoint" old_size=43 size=44
testlog_test.go:79: time=2024-03-16T18:20:29.892-07:00 level=ERROR source=ctlog.go:671 msg="pool sequencing failed" old_tree_size=102 entries=1 err="couldn't upload checkpoint to object storage: context deadline exceeded"
testlog_test.go:79: time=2024-03-16T18:20:29.892-07:00 level=WARN source=ctlog.go:213 msg="checkpoint in object storage is older than lock checkpoint" old_size=44 size=103
testlog_test.go:79: time=2024-03-16T18:20:30.031-07:00 level=ERROR source=ctlog.go:671 msg="pool sequencing failed" old_tree_size=142 entries=1 err="couldn't upload a tile: context deadline exceeded"
testlog_test.go:79: time=2024-03-16T18:20:30.031-07:00 level=WARN source=ctlog.go:213 msg="checkpoint in object storage is older than lock checkpoint" old_size=103 size=142
testlog_test.go:79: time=2024-03-16T18:20:30.134-07:00 level=ERROR source=ctlog.go:671 msg="pool sequencing failed" old_tree_size=143 entries=1 err="couldn't upload a tile: context deadline exceeded"
testlog_test.go:107: couldn't fetch right edge tiles: downloaded inconsistent tile