zos icon indicating copy to clipboard operation
zos copied to clipboard

Can't deploy VM on node 6886, provisiond errors

Open scottyeager opened this issue 3 months ago • 2 comments

We noticed that we can't deploy VMs to mainnet node 6886. For example, when trying to deploy a micro VM via the dashboard:

Image

Nodes logs show that provisiond is panicking in a loop. Here's an example (UTC timestamps):

    2025-10-08 14:59:06.689	
[-] provisiond: 	/home/runner/work/zos/zos/cmds/modules/provisiond/reporter.go:302 +0xd2
	2025-10-08 14:59:06.689	
[-] provisiond: created by github.com/threefoldtech/zos/cmds/modules/provisiond.(*Reporter).Run in goroutine 96
	2025-10-08 14:59:06.689	
[-] provisiond: 	/home/runner/work/zos/zos/cmds/modules/provisiond/reporter.go:261 +0xef
	2025-10-08 14:59:06.689	
[-] provisiond: github.com/threefoldtech/zos/cmds/modules/provisiond.(*Reporter).metrics(0xc001397e00, {0x184b4a0, 0xc0005be000})
	2025-10-08 14:59:06.689	
[-] provisiond: 	/home/runner/work/zos/zos/cmds/modules/provisiond/reporter.go:242 +0x5f
	2025-10-08 14:59:06.689	
[-] provisiond: github.com/threefoldtech/zos/cmds/modules/provisiond.(*Reporter).getMetrics(0xc001397e00, {0x184b4a0, 0xc0005be000})
	2025-10-08 14:59:06.689	
[-] provisiond: 	/home/runner/work/zos/zos/cmds/modules/provisiond/reporter.go:153 +0x159
	2025-10-08 14:59:06.689	
[-] provisiond: github.com/threefoldtech/zos/cmds/modules/provisiond.(*Reporter).getVmMetrics(0xc001397e00, {0x184b4a0, 0xc0005be000}, {0x18442f0, 0xc000340b70})
	2025-10-08 14:59:06.689	
[-] provisiond: 	/home/runner/go/pkg/mod/github.com/threefoldtech/[email protected]/pkg/stubs/vmd_stub.go:131 +0x225
	2025-10-08 14:59:06.689	
[-] provisiond: github.com/threefoldtech/zosbase/pkg/stubs.(*VMModuleStub).Metrics(0xc000f1bde8, {0x184b510, 0xc0001d0cb0})
	2025-10-08 14:59:06.689	
[-] provisiond: goroutine 161 [running]:
	2025-10-08 14:59:06.689	
[-] provisiond: 
	2025-10-08 14:59:06.689	
[-] provisiond: panic: context deadline exceeded

scottyeager avatar Oct 08 '25 20:10 scottyeager

  • I was able to successfully deploy a micro VM, after the node came back online. Based on the dashboard, the node appears to have been rebooted.

  • Upon reviewing the Zos logs, the provisiond entries indicate the nodeContract was created successfully. See images:

Image Image

TullysInc avatar Oct 09 '25 17:10 TullysInc

I investigated this issue the part of provisiond that is failing in the above logs in get mertics part is due to the node go all over the running instances and trying to get the consumption on the node with time out 1 min which should be sufficient to do the calculations it has only 25 active workloads which is very small amout nodes with larger workload numbers works without issue. so this may be happened because of bad disk or something like that

ashraffouda avatar Oct 19 '25 05:10 ashraffouda