Memory snapshots cannot be restored due to network packets with "already-parsed" headers
Description
Memory snapshots cannot be restored due to network packets with "already-parsed" headers
Steps to reproduce
I've been debugging a problem with restoration of memory-snapshots. Many of my crashes come from this line in PacketBuffer.consume.
I have two checkpoints to reproduce the problem: one that (consistently) fails to be restored, and another one that can be restored without problems.
I've been looking at gVisor logs (instrumented with my own debug logging). The difference between these snapshots seems to be this: the unsuccessful one always has a non-empty PacketBufferList during restore — i.e. it loads a non-empty PacketBufferList from the serialized checkpoint. It seems to me that this per se is a condition that gVisor software is not designed to handle well. Those packets seem to be "already parsed" (parsed in the network sense, not in the checkpoint sense: they've already been populated from the underlying network buffer), and gVisor doesn't seem to like that.
Prior to the "second" consume call, the headers in the packet object look something like this.
I0820 19:54:33.405501 1 packet_buffer.go:350] === Packet Headers Debug ===
I0820 19:54:33.405511 1 packet_buffer.go:353] Header[0] virtioNetHeader: offset=0, length=0
I0820 19:54:33.405516 1 packet_buffer.go:353] Header[1] linkHeader: offset=0, length=14
I0820 19:54:33.405520 1 packet_buffer.go:353] Header[2] networkHeader: offset=14, length=28
I0820 19:54:33.405523 1 packet_buffer.go:353] Header[3] transportHeader: offset=0, length=0
I0820 19:54:33.405527 1 packet_buffer.go:365] PacketBuffer state: reserved=0, pushed=0, consumed=42
What might be going on? I couldn't figure out conditions under which packets with partially-parsed headers could end up in processor.pkts. Perhaps packet-parsing can somehow race with checkpointing? I haven't explored that theory much.
I'm using source code from this revision to build gVisor: bb08e96046752b436adbdfd363673fde8ad8b373
runsc version
bb08e96046752b436adbdfd363673fde8ad8b373
docker version (if using docker)
uname
No response
kubectl (if using Kubernetes)
repo state (if built from source)
No response
runsc debug logs (if available)
Full `boot.txt` log (with some lines at the top redacted):
https://gist.githubusercontent.com/abhagwat/19a41d2cf9fce01e5d6f77678788ad1a/raw/b26460d1febd0ccde201fc56e872cf9d2d878075/gistfile0.txt
Another sign that some internal invariant in gVisor might have been violated: the pkt from the serialized PacketBufferList has refcount 0 (when it's loaded from the checkpoint).
This is in spite of the fact that gVisor only pushes things onto the list in one place, and it's careful to increment the refcount of the packet before doing so: https://github.com/google/gvisor/blob/dc0c4372b9c0b11d0568218986a09965895eaaba/pkg/tcpip/link/fdbased/processors.go#L170
My current tenuous working theory is this -- most stateify-savable objects in gVisor implement a beforeSave hook that performs synchronization: this hook generally drains pending work, stops background goroutines, etc etc -- so that stuff in memory is "quiesced" and can be serialized safely. processorManager, though, doesn't implement this hook. And it has this background goroutine running, processing packets. There's no explicit synchronization to stop this goroutine before we serialize the network-related data structures. So in theory we could serialize partially-processed packets to the checkpoint, because stateify-serialization code doesn't respect any mutexes.
(Please correct me if I'm missing any subtle mechanisms that protect against this.)
Here's an example of the "stop goroutine before serializing the data structure it manages" pattern.
I've been working on reproducing this problem in a gvisor unit test -- it certainly looks like state.Save(..., mgr) (where mgr is a processorManager) crashes very easily when mgr has an active packet-processing goroutine. I haven't yet gotten gvisor to successfully write out a corrupted checkpoint.
Adding a correct beforeSave hook to processManager seems to fix the bug.
Thanks for the investigation @abhagwat. Could you submit a PR with https://github.com/google/gvisor/commit/05f1baf1c74dfd149359d3960099877ffe6dd5ba? @nybidari is that the right fix?
Yes, @abhagwat your understanding is correct. Thanks for the fix!
Great, thank you!