gitpod icon indicating copy to clipboard operation
gitpod copied to clipboard

[content-service] cannot restart stopped workspace

Open kylos101 opened this issue 3 years ago • 18 comments

Bug description

The full error on start is:

cannot initialize workspace: cannot restore backup: tar /dst: tar /dst: exit status 2;tar: .docker-root/overlay2/dedcf720e850fe485571223ecd9959f159eb8eef5250e7ee42ac96f010a3e369/diff/root/.npm/_cacache/content-v2/sha512/2c/49/1e6cfe1a91b8ccb4030ecac14cfe4d1d4e557aaf369a254fd344a9a7e864c7f4cab9edb19e21e7313357cb8eaacc48e19d7fc2b1d2ed294af55846548846: Cannot open: Permission denied tar: .docker-root/overlay2/dedcf720e850fe485571223ecd9959f159eb8eef5250e7ee42ac96f010a3e369/diff/root/.npm/_cacache/content-v2/sha512/74/de/010104b1dac999964663a5780e77912dd860c4bf1089dabe1f9b8175af2aaed3b5e5cba0f24fe31bfc38f19bcc40ffd609a6d2ab6ea02561055f96ba53a6: Cannot open: Permission denied tar: Exiting with failure status due to previous errors

Logs: https://cloudlogging.app.goo.gl/82CpwrNzW9oJWX1Q8

Log entry for the error: https://console.cloud.google.com/logs/query;cursorTimestamp=2022-07-06T11:31:39Z;query=resource.labels.cluster_name%20:%20%2528%22eu51%22%2529%0A%22eb595f2b-7de4-4248-800d-7dfe0280f802%22%0Atimestamp%3D%222022-07-06T11:31:39Z%22%0AinsertId%3D%227hkbcwop4k5qd0t4%22;summaryFields=:false:32:beginning:false;timeRange=P1D?project=workspace-clusters

Trend over last 7 days: https://cloudlogging.app.goo.gl/qdGdJ7CrD1zmLyrQ8

image

Steps to reproduce

n.a

Workspace affected

https://prlct-shipapp-5830i5ovwcb.ws-eu51.gitpod.io/

Expected behavior

The ability to restart a stopped workspace

Example repository

No response

Anything else?

No response

Front logo Front conversations

kylos101 avatar Jul 06 '22 21:07 kylos101

@utam0k is this something you could look at next, once you are free? :pray: It appears to be impacting many users.

kylos101 avatar Jul 06 '22 22:07 kylos101

This is probably related to other similar issues of accessing .docker-root folder: https://github.com/gitpod-io/gitpod/issues/10569 https://github.com/gitpod-io/gitpod/issues/10108

sagor999 avatar Jul 07 '22 22:07 sagor999

There really seem to be a few more users with the same error. https://cloudlogging.app.goo.gl/owK8qukCGMcpkKQk7 image

utam0k avatar Jul 08 '22 07:07 utam0k

I removed the assignee by myself because I'll go into the vacation next week.

utam0k avatar Jul 08 '22 07:07 utam0k

I have tried everything and could not reproduce it with only this information in its current state. Someone else encountered the same error in our repository below, but I was able to reproduce it. https://github.com/gitpod-io/template-docker-compose

Maybe some manipulation is needed in the container.

utam0k avatar Jul 08 '22 07:07 utam0k

I encounter the same issue. I cannot start/restart a workspace, so I need to create a new one everytime. Work not pushed is work lost.

filipjnc avatar Jul 28 '22 09:07 filipjnc

Decided to try it out again today. Worked fine for an hour until the container stopped while working. Afterwards, cannot start the workspace. All my hour worth of work is gone - so annoying. Come one GitPod, get it together and fix it.


cannot initialize workspace: cannot restore backup: tar /dst: tar /dst: exit status 2;tar: .docker-root/overlay2/0ad5fb59a0da61c61bef0492772e450b29c3afb2c36d058fb30501d3fff86ac0/diff/usr/lib/supertokens/jre/legal/java.base/ADDITIONAL_LICENSE_INFO: Cannot open: Permission denied tar: .docker-root/overlay2/0ad5fb59a0da61c61bef0492772e450b29c3afb2c36d058fb30501d3fff86ac0/diff/usr/lib/supertokens/jre/legal/java.base/ASSEMBLY_EXCEPTION: 
…

filipjnc avatar Jul 31 '22 20:07 filipjnc

Hi, @filipjnc. Thanks for your report. This issue is already scheduled but has not yet been worked on due to priorities. Can I ask you to share the repository or how to reproduce it?

utam0k avatar Jul 31 '22 23:07 utam0k

Hi @utam0k, Workspace ID: filipjnc-dtlearnhunt-l2swncfwtr7 Cluster: ws-eu54.gitpod.io

I can reproduce it as follows:

  • Start new workspace from GitHub repo
  • Shut down workspace or wait for timeout
  • Can't restart it again, the error above keeps coming up. As if the container has been permanently corrupted.

filipjnc avatar Jul 31 '22 23:07 filipjnc

@filipjnc Thanks for your information. It helps us to resolve it. Is the reproduction rate 100%?

utam0k avatar Jul 31 '22 23:07 utam0k

@filipjnc Thanks for your information. It helps us to resolve it. Is the reproduction rate 100%?

Yes. Can reproduce it every time on my end. All old (damaged) workspaces could never be started again.

filipjnc avatar Jul 31 '22 23:07 filipjnc

Sorry for the trouble. Thanks for your help.

utam0k avatar Aug 01 '22 00:08 utam0k

image

Unfortunately this is about the fourth workspace I've lost. I'm becoming obsessive about git commit before I switch off as trust in Gitpod is minimal at the moment.

semiautomatix avatar Aug 05 '22 09:08 semiautomatix

Hi, @filipjnc. Thanks for your report. This issue is already scheduled but has not yet been worked on due to priorities. Can I ask you to share the repository or how to reproduce it?

As way of possibly testing, I'm using a skeleton of this project: https://github.com/sprintcube/docker-compose-lamp

semiautomatix avatar Aug 05 '22 09:08 semiautomatix

And again...

image

I forgot to push one of my commits, fortunately it's a small change. But annoying as because now I have to recreate the whole damn docker image.

semiautomatix avatar Aug 05 '22 10:08 semiautomatix

Attempted docker-compose down to bring everything down and no dice.

image

semiautomatix avatar Aug 05 '22 10:08 semiautomatix

Hi, @filipjnc. Thanks for your report. This issue is already scheduled but has not yet been worked on due to priorities. Can I ask you to share the repository or how to reproduce it?

As way of possibly testing, I'm using a skeleton of this project: https://github.com/sprintcube/docker-compose-lamp

@semiautomatix Sorry for the late reply 🙏

I tried to use the repo https://github.com/sprintcube/docker-compose-lamp you provided

  • Open workspace
  • Run docker-compose run
  • Write some files under the path /workspace/docker-compose-lamp
  • Manually stop the workspace

However, I can't reproduce it. Would you please provide more detailed steps? Thank you.

jenting avatar Aug 09 '22 01:08 jenting

Hi @utam0k, Workspace ID: filipjnc-dtlearnhunt-l2swncfwtr7 Cluster: ws-eu54.gitpod.io

I can reproduce it as follows:

  • Start new workspace from GitHub repo
  • Shut down workspace or wait for timeout
  • Can't restart it again, the error above keeps coming up. As if the container has been permanently corrupted.

@filipjnc Thanks for providing the information. I tried to access your repo to reproduce it, however since it's private so I can't do any further testing and reproduce it.

jenting avatar Aug 09 '22 01:08 jenting

Hi, @filipjnc. Thanks for your report. This issue is already scheduled but has not yet been worked on due to priorities. Can I ask you to share the repository or how to reproduce it?

As way of possibly testing, I'm using a skeleton of this project: https://github.com/sprintcube/docker-compose-lamp

@semiautomatix Sorry for the late reply 🙏

I tried to use the repo https://github.com/sprintcube/docker-compose-lamp you provided

  • Open workspace
  • Run docker-compose run
  • Write some files under the path /workspace/docker-compose-lamp
  • Manually stop the workspace

However, I can't reproduce it. Would you please provide more detailed steps? Thank you.

Thanks for the update. I was hoping a clean pull would recreated the error.

Additionally, I've cloned the project, added code to the www folder, started docker-compose, imported an SQL file into the database.

I'll attempt to recreate the error, and provide access to the repo.

semiautomatix avatar Aug 11 '22 09:08 semiautomatix

I encountered this issue as well. Screenshot 2022-09-05 at 12 45 26 PM

cannot initialize workspace: cannot restore backup: tar /dst: tar /dst: exit status 2;tar: buildkit/runc-overlayfs/snapshots/snapshots/28/fs/etc/sudoers.bkp: Cannot open: Permission denied tar: Exiting with failure status due to previous errors 

WS ID: gitpodio-workspaceimage-5wke6grm46l

axonasif avatar Sep 05 '22 06:09 axonasif

I encountered this issue twice for today. image I am suspecting this occurs because I left some docker containers(with compose) running and manually stopped the workspace.

nisan1337 avatar Sep 05 '22 13:09 nisan1337

facing it too, lost many file changes and .env setup #12900

Nishchit14 avatar Sep 13 '22 05:09 Nishchit14

@Furisto to follow-up on our conversation from earlier today:

  1. PVC may solve this, but self-hosted may not have PVC for a while.
  2. We should see if we can do a tactical solution here. For example, gracefully stop containers long before dispose starts, really close to after either (1) a workspace times out or (2) a user stops a workspace.

kylos101 avatar Oct 14 '22 22:10 kylos101

@kylos101 I am not convinced that 2. will help because containers that do not handle SIGTERM will be killed either way, regardless of the mechanism. I have found a way to reproduce this and it looks like it is related to us restoring the extended attributes of files. I do not get the error message in my workspace if I do not specify --xattrs during untar. Trying to find a way now how I can solve this error without reintroducing what caused us to restore extended attributes in the first place.

Furisto avatar Oct 17 '22 15:10 Furisto

I have found a way to reproduce this and it looks like it is related to us restoring the extended attributes of files.

Interesting @Furisto ! How come that permission denied message generally pertains to .docker-root/overlay2?

I do not get the error message in my workspace if I do not specify --xattrs during untar. Trying to find a way now how I can solve this error without reintroducing what caused us to restore extended attributes in the first place.

Okay, this further confirms that going to PVC will resolve the issue. Can you share a reference issue or PR Re: what caused us to restore extended attributes in the first place.? I recommend setting a timebox for this issue, where you draw a line in the sand, sharing that if we cannot resolve confidently, then, we'll need to wait for PVC.

kylos101 avatar Oct 17 '22 21:10 kylos101

We have identified a solution for this problem but it will take some time testing this until we can roll it out to all customers. Current estimate is ~1 month.

Furisto avatar Oct 18 '22 21:10 Furisto

Decided to try it out again today. Worked fine for an hour until the container stopped while working. Afterwards, cannot start the workspace. All my hour worth of work is gone - so annoying. Come one GitPod, get it together and fix it.


cannot initialize workspace: cannot restore backup: tar /dst: tar /dst: exit status 2;tar: .docker-root/overlay2/0ad5fb59a0da61c61bef0492772e450b29c3afb2c36d058fb30501d3fff86ac0/diff/usr/lib/supertokens/jre/legal/java.base/ADDITIONAL_LICENSE_INFO: Cannot open: Permission denied tar: .docker-root/overlay2/0ad5fb59a0da61c61bef0492772e450b29c3afb2c36d058fb30501d3fff86ac0/diff/usr/lib/supertokens/jre/legal/java.base/ASSEMBLY_EXCEPTION: 
…

We have exactly the same problem and we are also using Supertokens as a Docker container. I think it's related to these files under .docker-root/overlay2/(...)/diff/usr/lib/supertokens/jre/legal, they have 444 permissions which means no one has write access to these files. Maybe this is the root cause of the permission denied error.

6uliver avatar Oct 19 '22 10:10 6uliver

I've created a simple repository where this bug can be reproduced in a stable way, I hope this will help to test the bugfix and resolve the bug.

Steps to reproduce

  1. Open this repository on Gitpod: https://gitlab.com/6uliver/gitpod-workspace-restart-error-repro
  2. Wait for the workspace and Supertokens to be started, you should see this message in the console: "Started SuperTokens on 0.0.0.0:3567"
  3. Stop the workspace and wait for it to be stopped
  4. Open and start workspace

Expected result

The workspace can be started successfully and Supertokens is running.

Actual result

You can see a Gitpod error page with the text "Oh, no! Something went wrong!" and the following long error message:

initializeWorkspaceContent failed: cannot initialize workspace: cannot restore backup: tar /dst: tar /dst: exit status 2;
tar: .docker-root/overlay2/a061703a1735ce483b1f0d56341fe6943e2adf755b31b20a3f8ed82ee8f38e9e/diff/usr/lib/supertokens/jre/legal/java.base/ADDITIONAL_LICENSE_INFO: Cannot open: Permission denied 
tar: .docker-root/overlay2/a061703a1735ce483b1f0d56341fe6943e2adf755b31b20a3f8ed82ee8f38e9e/diff/usr/lib/supertokens/jre/legal/java.base/ASSEMBLY_EXCEPTION: Cannot open: Permission denied 
...

6uliver avatar Oct 19 '22 12:10 6uliver

@Furisto I noticed that this problem is for all upper-layer files. Perhaps it is a change that occurred within a container. This file is supposed to be deleted when dockerd stops. In other words, I thought I was receiving a SIGKILL, so I made this change. What do you think? I just couldn't figure out how to reproduce it and would love to hear what you know how to do. https://github.com/gitpod-io/gitpod/pull/14498

utam0k avatar Nov 08 '22 06:11 utam0k

@Furisto I noticed that this problem is for all upper-layer files. Perhaps it is a change that occurred within a container. This file is supposed to be deleted when dockerd stops. In other words, I thought I was receiving a SIGKILL, so I made this change. What do you think? I just couldn't figure out how to reproduce it and would love to hear what you know how to do. #14498

@utam0k are you able to recreate this issue when using PVC (like with Gitpod project and repo)? Or, are you only able to recreate when using backup powered by object storage? If the problem is isolated to object storage, we should wait, PVC may solve. If you can recreate the problem with PVC too, let us know?

kylos101 avatar Nov 08 '22 22:11 kylos101