Retrying failed/interruped commit gives error
Description
When we commit a contianer and the commit is interrupted and when we retry the commit we see the error failed to export layer: snapshot \"sha256:5ca359d74c4d65ad7dc2fc1013b3cccfa921f9000395ac846fd06a37c9a1a67e-parent-view\": already exists. I think this happens because there is no garbage collection of the bolt KVs and the error is thrown here in containerd's bolt module
Steps to reproduce the issue
- ctr container create docker.io/library/ubuntu:20.04 my-ubuntu
- sudo ctr task start my-ubuntu
- sudo nerdctl container exec -it my-ubuntu bash
- fallocate -l 50000000K test.txt
- sudo nerdctl commit my-ubuntu my-ubuntu-commited
- SIGINT(Ctrl+C)
- sudo nerdctl commit my-ubuntu my-ubuntu-commited.
yakul@yeshu:~$ sudo nerdctl commit my-ubuntu my-ubuntu-commited
WARN[0000] Image lacks label "nerdctl/platform", assuming the platform to be "linux/amd64"
^C
yakul@yeshu:~$ sudo nerdctl commit my-ubuntu my-ubuntu-commited
WARN[0000] Image lacks label "nerdctl/platform", assuming the platform to be "linux/amd64"
FATA[0000] failed to export layer: snapshot "sha256:cdca8156a203b9719f985c3114336529115cdc392f89d45cfcd37c968ddd3645-parent-view": already exists
Describe the results you received and expected
Recieved: Container stuck in PAUSED state and cannot not be committed in the second attempt.
Expected: Either the container should be successfully committed on second attempt or if not, it should fallback to RUNNING state.
What version of nerdctl are you using?
Client: Version: v0.20.0 OS/Arch: linux/amd64 Git commit: e77e05b5fd252274e3727e0439e9a2d45622ccb9
Server: containerd: Version: 1.6.6 GitCommit: 10c12954828e7c7c9b6e0ea9b0c02b01407d3ae1
What version of ctr are you using?
Client: Version: 1.6.6 Revision: 10c12954828e7c7c9b6e0ea9b0c02b01407d3ae1 Go version: go1.17.11
Are you using a variant of nerdctl? (e.g., Rancher Desktop)
No response
Host information
No response
I can not reproduce the bug, would you mind giving us more specific reproduce steps?
I have updated the issue description with steps I followed to reproduce the bug.
I feel there should be signal handlers on the commit context so as to delete any garbage keys created by bolt in case a commit was interrupted/failed before completion.
Opened PR in containerd in relation to this