gcloudrig-games disk does not create from shapshot
This has happened twice now to me. I start the instance and it boots normally, but it never creates the games disk from the snapshot. I still am trying to figure out why this is happening, but I am still not sure.
Same. Ocasionally the disk never gets attached
Yeah this just happened to me again.
To clarify, under "Disks" the disk is never loading. It never tries to create it and I cant see any errors of it failing in the logs but I could be looking in the wrong places...
I've had this happen a few times too - my suspicion lies with the boot script being executed too early. I'll take a closer look next time it happens!
Some things to help troubleshoot:
-
Does the disk get created but not attached, or is it not created at all?
-
Does a disk with the same name still exist in another region/zone?
-
Does restarting the Instance attach/restore the disk correctly?
-
Are there any relevant Disk Creation or Disk Attach activity recorded in GCP Compute Operations?
On Mon, 8 Mar 2021, 11:36 am Owen Schwartz, [email protected] wrote:
Yeah this just happened to me again.
To clarify, under "Disks" the disk is never loading. It never tries to create it and I cant see any errors of it failing in the logs but I could be looking in the wrong places...
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/gcloudrig/gcloudrig/issues/80#issuecomment-792403450, or unsubscribe https://github.com/notifications/unsubscribe-auth/AF5YVP3CIZB4RNJSDI2FABTTCQSZLANCNFSM4YLV3PIQ .
This has happened to me a number of times.
-
Does the disk get created but not attached, or is it not created at all? Not created at all.
-
Does a disk with the same name still exist in another region/zone? Nope, no other disks.
-
Does restarting the Instance attach/restore the disk correctly? Yes, restarting the instance creates and attaches the disk correctly.
I can't see anything in the log viewer.
Scaling it down in this state also frequently fails to delete the old image of the machine and it has to be manually deleted.
I'm thinking storing the disk name as a project metadata "flag" might make this more resilient (and maybe even a tad faster).
scale-up:
-
Make the scale-up script restore/create the games disk snapshot, rather than the PowerShell boot script
-
Once created, store a flag in project metadata indicating the disk's name and zone. This can be done in parallel while scaling up the Instance Group to save a bit of time.
-
Modify the PowerShell boot script to poll project metadata for this flag (similar to #32). Once found, attach the disk it mentions.
-
Send a desktop toast when starting to attach any disk (implementing #74)
-
Never stop polling; if the flag is removed, detach the disk from within the Instance (after running sync)
scale-down:
-
Remove the project metadata flag and poll until the disk is detached.
-
If a timeout is reached, scale down the Instance Group anyway and keep waiting for the disk to be detached
-
As soon as disk is detached, commence snapshot and disk deletion (can be in parallel with scaling down the Instance, if this hasn't already started)
Sounds good! The disk issue seems really random.
Some times, the games disk gets attached within minutes of the instance scaling up. Sometimes it never attaches and I need to ./scale-down and then ./scale-up for it to work.