BUG: akash provider lease-shell stops working when pod gets restarted due to eviction
Reproducer
- deploy something with 1 or 10 GB storage request (== limit);
- consume more than the limit in step 1;
root@ssh-6fd4f4bdf9-7b2p2:/# dd if=/dev/zero of=/test-count-2048 bs=10M count=2048
2048+0 records in
2048+0 records out
21474836480 bytes (21 GB, 20 GiB) copied, 13.8965 s, 1.5 GB/s
root@ssh-6fd4f4bdf9-7b2p2:/# Error: lease shell failed: remote process exited with code 137
- this will cause the pod to restart due to:
"reason": "Evicted",
"note": "Container ssh exceeded its local ephemeral storage limit \"10737418240\". ",
See the entire
akash provider lease-eventslog below.
-
akash provider lease-shellstops working;
$ akash provider lease-shell --tty --stdin -- ssh bash
Error: lease shell failed: remote command execute error: service with that name is not running: the service has failed
version 0.16.4-rc0
akash provider & client are of 0.16.4-rc0 version
$ curl -sk "https://provider.europlots.com:8443/version" | jq -r
{
"akash": {
"version": "v0.16.4-rc0",
"commit": "38b82258c14e3d0a2ed3d15a8d4140ec8c826a84",
"buildTags": "\"osusergo,netgo,ledger,static_build\"",
"go": "go version go1.17.6 linux/amd64",
"cosmosSdkVersion": "v0.45.1"
},
"kube": {
"major": "1",
"minor": "23",
"gitVersion": "v1.23.5",
"gitCommit": "c285e781331a3785a7f436042c65c5641ce8a9e9",
"gitTreeState": "clean",
"buildDate": "2022-03-16T15:52:18Z",
"goVersion": "go1.17.8",
"compiler": "gc",
"platform": "linux/amd64"
}
}
lease-status after the eviction
$ akash provider lease-status
{
"services": {
"ssh": {
"name": "ssh",
"available": 1,
"total": 1,
"uris": [
"31ai266lqddovfbslrlj1vtcfk.ingress.europlots.com"
],
"observed_generation": 1,
"replicas": 1,
"updated_replicas": 1,
"ready_replicas": 1,
"available_replicas": 1
}
},
"forwarded_ports": {
"ssh": [
{
"host": "ingress.europlots.com",
"port": 22,
"externalPort": 32459,
"proto": "TCP",
"available": 1,
"name": "ssh"
}
]
}
}
lease-events logs
$ akash provider lease-events
{
"type": "Normal",
"reason": "Sync",
"note": "Scheduled for sync",
"object": {
"kind": "Ingress",
"namespace": "vpjq3g0uoce5ffa9j85h74t9skosfj92dp4ce7eamhsdg",
"name": "31ai266lqddovfbslrlj1vtcfk.ingress.europlots.com"
},
"lease_id": {
"owner": "akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h",
"dseq": 5673203,
"gseq": 1,
"oseq": 1,
"provider": "akash18ga02jzaq8cw52anyhzkwta5wygufgu6zsz6xc"
}
}
{
"type": "Normal",
"reason": "Scheduled",
"note": "Successfully assigned vpjq3g0uoce5ffa9j85h74t9skosfj92dp4ce7eamhsdg/ssh-6fd4f4bdf9-7b2p2 to node2",
"object": {
"kind": "Pod",
"namespace": "vpjq3g0uoce5ffa9j85h74t9skosfj92dp4ce7eamhsdg",
"name": "ssh-6fd4f4bdf9-7b2p2"
},
"lease_id": {
"owner": "akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h",
"dseq": 5673203,
"gseq": 1,
"oseq": 1,
"provider": "akash18ga02jzaq8cw52anyhzkwta5wygufgu6zsz6xc"
}
}
{
"type": "Normal",
"reason": "Pulled",
"note": "Container image \"ubuntu:21.10\" already present on machine",
"object": {
"kind": "Pod",
"namespace": "vpjq3g0uoce5ffa9j85h74t9skosfj92dp4ce7eamhsdg",
"name": "ssh-6fd4f4bdf9-7b2p2"
},
"lease_id": {
"owner": "akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h",
"dseq": 5673203,
"gseq": 1,
"oseq": 1,
"provider": "akash18ga02jzaq8cw52anyhzkwta5wygufgu6zsz6xc"
}
}
{
"type": "Normal",
"reason": "Created",
"note": "Created container ssh",
"object": {
"kind": "Pod",
"namespace": "vpjq3g0uoce5ffa9j85h74t9skosfj92dp4ce7eamhsdg",
"name": "ssh-6fd4f4bdf9-7b2p2"
},
"lease_id": {
"owner": "akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h",
"dseq": 5673203,
"gseq": 1,
"oseq": 1,
"provider": "akash18ga02jzaq8cw52anyhzkwta5wygufgu6zsz6xc"
}
}
{
"type": "Normal",
"reason": "Started",
"note": "Started container ssh",
"object": {
"kind": "Pod",
"namespace": "vpjq3g0uoce5ffa9j85h74t9skosfj92dp4ce7eamhsdg",
"name": "ssh-6fd4f4bdf9-7b2p2"
},
"lease_id": {
"owner": "akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h",
"dseq": 5673203,
"gseq": 1,
"oseq": 1,
"provider": "akash18ga02jzaq8cw52anyhzkwta5wygufgu6zsz6xc"
}
}
{
"type": "Warning",
"reason": "Evicted",
"note": "Container ssh exceeded its local ephemeral storage limit \"10737418240\". ",
"object": {
"kind": "Pod",
"namespace": "vpjq3g0uoce5ffa9j85h74t9skosfj92dp4ce7eamhsdg",
"name": "ssh-6fd4f4bdf9-7b2p2"
},
"lease_id": {
"owner": "akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h",
"dseq": 5673203,
"gseq": 1,
"oseq": 1,
"provider": "akash18ga02jzaq8cw52anyhzkwta5wygufgu6zsz6xc"
}
}
{
"type": "Normal",
"reason": "Killing",
"note": "Stopping container ssh",
"object": {
"kind": "Pod",
"namespace": "vpjq3g0uoce5ffa9j85h74t9skosfj92dp4ce7eamhsdg",
"name": "ssh-6fd4f4bdf9-7b2p2"
},
"lease_id": {
"owner": "akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h",
"dseq": 5673203,
"gseq": 1,
"oseq": 1,
"provider": "akash18ga02jzaq8cw52anyhzkwta5wygufgu6zsz6xc"
}
}
{
"type": "Normal",
"reason": "Scheduled",
"note": "Successfully assigned vpjq3g0uoce5ffa9j85h74t9skosfj92dp4ce7eamhsdg/ssh-6fd4f4bdf9-fwn5g to node2",
"object": {
"kind": "Pod",
"namespace": "vpjq3g0uoce5ffa9j85h74t9skosfj92dp4ce7eamhsdg",
"name": "ssh-6fd4f4bdf9-fwn5g"
},
"lease_id": {
"owner": "akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h",
"dseq": 5673203,
"gseq": 1,
"oseq": 1,
"provider": "akash18ga02jzaq8cw52anyhzkwta5wygufgu6zsz6xc"
}
}
{
"type": "Normal",
"reason": "Pulled",
"note": "Container image \"ubuntu:21.10\" already present on machine",
"object": {
"kind": "Pod",
"namespace": "vpjq3g0uoce5ffa9j85h74t9skosfj92dp4ce7eamhsdg",
"name": "ssh-6fd4f4bdf9-fwn5g"
},
"lease_id": {
"owner": "akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h",
"dseq": 5673203,
"gseq": 1,
"oseq": 1,
"provider": "akash18ga02jzaq8cw52anyhzkwta5wygufgu6zsz6xc"
}
}
{
"type": "Normal",
"reason": "Created",
"note": "Created container ssh",
"object": {
"kind": "Pod",
"namespace": "vpjq3g0uoce5ffa9j85h74t9skosfj92dp4ce7eamhsdg",
"name": "ssh-6fd4f4bdf9-fwn5g"
},
"lease_id": {
"owner": "akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h",
"dseq": 5673203,
"gseq": 1,
"oseq": 1,
"provider": "akash18ga02jzaq8cw52anyhzkwta5wygufgu6zsz6xc"
}
}
{
"type": "Normal",
"reason": "Started",
"note": "Started container ssh",
"object": {
"kind": "Pod",
"namespace": "vpjq3g0uoce5ffa9j85h74t9skosfj92dp4ce7eamhsdg",
"name": "ssh-6fd4f4bdf9-fwn5g"
},
"lease_id": {
"owner": "akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h",
"dseq": 5673203,
"gseq": 1,
"oseq": 1,
"provider": "akash18ga02jzaq8cw52anyhzkwta5wygufgu6zsz6xc"
}
}
{
"type": "Normal",
"reason": "SuccessfulCreate",
"note": "Created pod: ssh-6fd4f4bdf9-7b2p2",
"object": {
"kind": "ReplicaSet",
"namespace": "vpjq3g0uoce5ffa9j85h74t9skosfj92dp4ce7eamhsdg",
"name": "ssh-6fd4f4bdf9"
},
"lease_id": {
"owner": "akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h",
"dseq": 5673203,
"gseq": 1,
"oseq": 1,
"provider": "akash18ga02jzaq8cw52anyhzkwta5wygufgu6zsz6xc"
}
}
{
"type": "Normal",
"reason": "SuccessfulCreate",
"note": "Created pod: ssh-6fd4f4bdf9-fwn5g",
"object": {
"kind": "ReplicaSet",
"namespace": "vpjq3g0uoce5ffa9j85h74t9skosfj92dp4ce7eamhsdg",
"name": "ssh-6fd4f4bdf9"
},
"lease_id": {
"owner": "akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h",
"dseq": 5673203,
"gseq": 1,
"oseq": 1,
"provider": "akash18ga02jzaq8cw52anyhzkwta5wygufgu6zsz6xc"
}
}
{
"type": "Normal",
"reason": "ScalingReplicaSet",
"note": "Scaled up replica set ssh-6fd4f4bdf9 to 1",
"object": {
"kind": "Deployment",
"namespace": "vpjq3g0uoce5ffa9j85h74t9skosfj92dp4ce7eamhsdg",
"name": "ssh"
},
"lease_id": {
"owner": "akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h",
"dseq": 5673203,
"gseq": 1,
"oseq": 1,
"provider": "akash18ga02jzaq8cw52anyhzkwta5wygufgu6zsz6xc"
}
}
I thought I had looked at something similar in the past, let me see if I can find it
Looks similar to issue #1480 . Maybe this got reintroduced somehow?
maybe the fix wasn't applied to the master branch previously
Reproduced on master branch, will have to track this down to see what is going on
I mean was it fixed on mainnet/main when mainnet/main was v0.14.x then lost when master was merged into mainnet/main for 0.16.x?
It's hitting this line I think we need to filter out the pods that have failed & whatnot before trying to run the command
https://github.com/ovrclk/akash/blob/master/provider/cluster/kube/client_exec.go#L100
@arno01 oh yeah now that I look at this are you sure the pod restarts? When I do this locally while watching the kubernetes cluster the pod moves to "completed". After a while the provider closes the lease because the containers aren't running.
The kubernetes pod has a restart policy of always, but apparently that doesn't mean anything of the sort
$ kubectl get pod --namespace=cul2933lrothig1100l4s5ra710m53f6sol2mncvhht3m web-77db64bfd-cn8jk -o=jsonpath='{.spec.restartPolicy}' && echo
Always
I tried changing to "OnFailure" (since "Never" seems like a poor choice) but that gives me this error
E[2022-05-04|13:50:34.410] applying deployment module=provider-cluster-kube err="Deployment.apps \"bew\" is invalid: spec.template.spec.restartPolicy: Unsupported value: \"OnFailure\": supported values: \"Always\"" lease=akash178ctpsxaa4fcyq0fwtds4qx2ha0maluwll87wx/12/1/1/akash1xglzcfu4g9her6xhz95fk78h9555qaxz70cf4s service=bew
@sacreman any suggestions here?
@hydrogen18 I've tested this again just now:
TL;DR Looks like that issue is isolated to a single provider - Europlots. I would think of closing this issue, but as you've also reproduced it maybe you want to check more things?
- [Lumen] can confirm, the pod moves to "Completed" and the new one gets created;
- [Lumen] cannot reproduce the issue -> I can lease-shell into the new one without issues;
- [Akash.Pro] cannot reproduce the issue on my provider
- [Europlots] can reproduce the issue:
$ akash provider lease-shell --tty --stdin -- ssh bash
Error: lease shell failed: remote command execute error: service with that name is not running: the service has failed
-
:8443/versionreports are same (1:1); - looks like lease-events reports aren't sorted by time, see the output below, the "Scheduled" event goes before the "Killed" event (for Lumen);
Evidence (Lumen)
$ curl -s -k https://provider.mainnet-1.ca.aksh.pw:8443/version | jq
{
"akash": {
"version": "v0.16.4-rc0",
"commit": "38b82258c14e3d0a2ed3d15a8d4140ec8c826a84",
"buildTags": "\"osusergo,netgo,ledger,static_build\"",
"go": "go version go1.17.6 linux/amd64",
"cosmosSdkVersion": "v0.45.1"
},
"kube": {
"major": "1",
"minor": "23",
"gitVersion": "v1.23.5",
"gitCommit": "c285e781331a3785a7f436042c65c5641ce8a9e9",
"gitTreeState": "clean",
"buildDate": "2022-03-16T15:52:18Z",
"goVersion": "go1.17.8",
"compiler": "gc",
"platform": "linux/amd64"
}
}
$ akash provider lease-events > lease-events.1
$ cat lease-events.1 | jq -r '[(.lease_id | .dseq, .gseq, .oseq, .provider), (.object | .kind, .name), .type, .reason, .note] | @csv' | column -t -s","
5823330 1 1 "akash1q7spv2cw06yszgfp4f9ed59lkka6ytn8g4tkjf" "Ingress" "k9dq760v49f5t6l6v2hbqts7ac.ingress.mainnet-1.ca.aksh.pw" "Normal" "Sync" "Scheduled for sync"
5823330 1 1 "akash1q7spv2cw06yszgfp4f9ed59lkka6ytn8g4tkjf" "Ingress" "k9dq760v49f5t6l6v2hbqts7ac.ingress.mainnet-1.ca.aksh.pw" "Normal" "Sync" "Scheduled for sync"
5823330 1 1 "akash1q7spv2cw06yszgfp4f9ed59lkka6ytn8g4tkjf" "Ingress" "k9dq760v49f5t6l6v2hbqts7ac.ingress.mainnet-1.ca.aksh.pw" "Normal" "Sync" "Scheduled for sync"
5823330 1 1 "akash1q7spv2cw06yszgfp4f9ed59lkka6ytn8g4tkjf" "Ingress" "k9dq760v49f5t6l6v2hbqts7ac.ingress.mainnet-1.ca.aksh.pw" "Normal" "Sync" "Scheduled for sync"
5823330 1 1 "akash1q7spv2cw06yszgfp4f9ed59lkka6ytn8g4tkjf" "Ingress" "k9dq760v49f5t6l6v2hbqts7ac.ingress.mainnet-1.ca.aksh.pw" "Normal" "Sync" "Scheduled for sync"
5823330 1 1 "akash1q7spv2cw06yszgfp4f9ed59lkka6ytn8g4tkjf" "Ingress" "k9dq760v49f5t6l6v2hbqts7ac.ingress.mainnet-1.ca.aksh.pw" "Normal" "Sync" "Scheduled for sync"
5823330 1 1 "akash1q7spv2cw06yszgfp4f9ed59lkka6ytn8g4tkjf" "Ingress" "k9dq760v49f5t6l6v2hbqts7ac.ingress.mainnet-1.ca.aksh.pw" "Normal" "Sync" "Scheduled for sync"
5823330 1 1 "akash1q7spv2cw06yszgfp4f9ed59lkka6ytn8g4tkjf" "Ingress" "k9dq760v49f5t6l6v2hbqts7ac.ingress.mainnet-1.ca.aksh.pw" "Normal" "Sync" "Scheduled for sync"
5823330 1 1 "akash1q7spv2cw06yszgfp4f9ed59lkka6ytn8g4tkjf" "Ingress" "k9dq760v49f5t6l6v2hbqts7ac.ingress.mainnet-1.ca.aksh.pw" "Normal" "Sync" "Scheduled for sync"
5823330 1 1 "akash1q7spv2cw06yszgfp4f9ed59lkka6ytn8g4tkjf" "Ingress" "k9dq760v49f5t6l6v2hbqts7ac.ingress.mainnet-1.ca.aksh.pw" "Normal" "Sync" "Scheduled for sync"
5823330 1 1 "akash1q7spv2cw06yszgfp4f9ed59lkka6ytn8g4tkjf" "Pod" "ssh-7c9bb88b9f-54fvx" "Normal" "Scheduled" "Successfully assigned ujrprcbfd0sjljt11f1rbignp2b65knk76qjphearskt8/ssh-7c9bb88b9f-54fvx to k8s-node-9.mainnet-1.ca"
5823330 1 1 "akash1q7spv2cw06yszgfp4f9ed59lkka6ytn8g4tkjf" "Pod" "ssh-7c9bb88b9f-54fvx" "Normal" "Pulling" "Pulling image ""ubuntu:21.10"""
5823330 1 1 "akash1q7spv2cw06yszgfp4f9ed59lkka6ytn8g4tkjf" "Pod" "ssh-7c9bb88b9f-54fvx" "Normal" "Pulled" "Successfully pulled image ""ubuntu:21.10"" in 3.38558492s"
5823330 1 1 "akash1q7spv2cw06yszgfp4f9ed59lkka6ytn8g4tkjf" "Pod" "ssh-7c9bb88b9f-54fvx" "Normal" "Created" "Created container ssh"
5823330 1 1 "akash1q7spv2cw06yszgfp4f9ed59lkka6ytn8g4tkjf" "Pod" "ssh-7c9bb88b9f-54fvx" "Normal" "Started" "Started container ssh"
5823330 1 1 "akash1q7spv2cw06yszgfp4f9ed59lkka6ytn8g4tkjf" "Pod" "ssh-7c9bb88b9f-rzxbx" "Normal" "Scheduled" "Successfully assigned ujrprcbfd0sjljt11f1rbignp2b65knk76qjphearskt8/ssh-7c9bb88b9f-rzxbx to k8s-node-5.mainnet-1.ca"
5823330 1 1 "akash1q7spv2cw06yszgfp4f9ed59lkka6ytn8g4tkjf" "Pod" "ssh-7c9bb88b9f-rzxbx" "Normal" "Pulling" "Pulling image ""ubuntu:21.10"""
5823330 1 1 "akash1q7spv2cw06yszgfp4f9ed59lkka6ytn8g4tkjf" "Pod" "ssh-7c9bb88b9f-rzxbx" "Normal" "Pulled" "Successfully pulled image ""ubuntu:21.10"" in 3.385080374s"
5823330 1 1 "akash1q7spv2cw06yszgfp4f9ed59lkka6ytn8g4tkjf" "Pod" "ssh-7c9bb88b9f-rzxbx" "Normal" "Created" "Created container ssh"
5823330 1 1 "akash1q7spv2cw06yszgfp4f9ed59lkka6ytn8g4tkjf" "Pod" "ssh-7c9bb88b9f-rzxbx" "Normal" "Started" "Started container ssh"
5823330 1 1 "akash1q7spv2cw06yszgfp4f9ed59lkka6ytn8g4tkjf" "Pod" "ssh-7c9bb88b9f-rzxbx" "Warning" "Evicted" "Container ssh exceeded its local ephemeral storage limit ""1073741824"". "
5823330 1 1 "akash1q7spv2cw06yszgfp4f9ed59lkka6ytn8g4tkjf" "Pod" "ssh-7c9bb88b9f-rzxbx" "Normal" "Killing" "Stopping container ssh"
5823330 1 1 "akash1q7spv2cw06yszgfp4f9ed59lkka6ytn8g4tkjf" "ReplicaSet" "ssh-7c9bb88b9f" "Normal" "SuccessfulCreate" "Created pod: ssh-7c9bb88b9f-rzxbx"
5823330 1 1 "akash1q7spv2cw06yszgfp4f9ed59lkka6ytn8g4tkjf" "ReplicaSet" "ssh-7c9bb88b9f" "Normal" "SuccessfulCreate" "Created pod: ssh-7c9bb88b9f-54fvx"
5823330 1 1 "akash1q7spv2cw06yszgfp4f9ed59lkka6ytn8g4tkjf" "Deployment" "ssh" "Normal" "ScalingReplicaSet" "Scaled up replica set ssh-7c9bb88b9f to 1"
$ kubectl get pods -A -o wide | grep ssh
ujrprcbfd0sjljt11f1rbignp2b65knk76qjphearskt8 ssh-7c9bb88b9f-rzxbx 1/1 Running 0 2m1s 10.233.109.137 k8s-node-5.mainnet-1.ca <none> <none>
$ kubectl get pods -A -o wide | grep ssh
ujrprcbfd0sjljt11f1rbignp2b65knk76qjphearskt8 ssh-7c9bb88b9f-54fvx 0/1 ContainerCreating 0 9s <none> k8s-node-9.mainnet-1.ca <none> <none>
ujrprcbfd0sjljt11f1rbignp2b65knk76qjphearskt8 ssh-7c9bb88b9f-rzxbx 0/1 Completed 0 2m11s 10.233.109.137 k8s-node-5.mainnet-1.ca <none> <none>
$ kubectl get pods -A -o wide | grep ssh
ujrprcbfd0sjljt11f1rbignp2b65knk76qjphearskt8 ssh-7c9bb88b9f-54fvx 1/1 Running 0 47s 10.233.99.87 k8s-node-9.mainnet-1.ca <none> <none>
ujrprcbfd0sjljt11f1rbignp2b65knk76qjphearskt8 ssh-7c9bb88b9f-rzxbx 0/1 Completed 0 2m49s 10.233.109.137 k8s-node-5.mainnet-1.ca <none> <none>
Evidence (Europlots)
$ curl -s -k https://provider.europlots.com:8443/version | jq
{
"akash": {
"version": "v0.16.4-rc0",
"commit": "38b82258c14e3d0a2ed3d15a8d4140ec8c826a84",
"buildTags": "\"osusergo,netgo,ledger,static_build\"",
"go": "go version go1.17.6 linux/amd64",
"cosmosSdkVersion": "v0.45.1"
},
"kube": {
"major": "1",
"minor": "23",
"gitVersion": "v1.23.5",
"gitCommit": "c285e781331a3785a7f436042c65c5641ce8a9e9",
"gitTreeState": "clean",
"buildDate": "2022-03-16T15:52:18Z",
"goVersion": "go1.17.8",
"compiler": "gc",
"platform": "linux/amd64"
}
}
$ akash provider lease-events > lease-events.2
$ cat lease-events.2 | jq -r '[(.lease_id | .dseq, .gseq, .oseq, .provider), (.object | .kind, .name), .type, .reason, .note] | @csv' | column -t -s","
5823531 1 1 "akash18ga02jzaq8cw52anyhzkwta5wygufgu6zsz6xc" "Ingress" "gi52llqkrh8u98i6m3j0udd95c.ingress.europlots.com" "Normal" "Sync" "Scheduled for sync"
5823531 1 1 "akash18ga02jzaq8cw52anyhzkwta5wygufgu6zsz6xc" "Pod" "ssh-6ff4cf85f-gt6t2" "Normal" "Scheduled" "Successfully assigned e8eivkd2u9j2vcvp7jjjsgi3uc65on2sqro3td0bjpfro/ssh-6ff4cf85f-gt6t2 to node3"
5823531 1 1 "akash18ga02jzaq8cw52anyhzkwta5wygufgu6zsz6xc" "Pod" "ssh-6ff4cf85f-gt6t2" "Normal" "Pulled" "Container image ""ubuntu:21.10"" already present on machine"
5823531 1 1 "akash18ga02jzaq8cw52anyhzkwta5wygufgu6zsz6xc" "Pod" "ssh-6ff4cf85f-gt6t2" "Normal" "Created" "Created container ssh"
5823531 1 1 "akash18ga02jzaq8cw52anyhzkwta5wygufgu6zsz6xc" "Pod" "ssh-6ff4cf85f-gt6t2" "Normal" "Started" "Started container ssh"
5823531 1 1 "akash18ga02jzaq8cw52anyhzkwta5wygufgu6zsz6xc" "Pod" "ssh-6ff4cf85f-gt6t2" "Warning" "Evicted" "Container ssh exceeded its local ephemeral storage limit ""1073741824"". "
5823531 1 1 "akash18ga02jzaq8cw52anyhzkwta5wygufgu6zsz6xc" "Pod" "ssh-6ff4cf85f-gt6t2" "Normal" "Killing" "Stopping container ssh"
5823531 1 1 "akash18ga02jzaq8cw52anyhzkwta5wygufgu6zsz6xc" "Pod" "ssh-6ff4cf85f-zh4bk" "Normal" "Scheduled" "Successfully assigned e8eivkd2u9j2vcvp7jjjsgi3uc65on2sqro3td0bjpfro/ssh-6ff4cf85f-zh4bk to node3"
5823531 1 1 "akash18ga02jzaq8cw52anyhzkwta5wygufgu6zsz6xc" "Pod" "ssh-6ff4cf85f-zh4bk" "Normal" "Pulled" "Container image ""ubuntu:21.10"" already present on machine"
5823531 1 1 "akash18ga02jzaq8cw52anyhzkwta5wygufgu6zsz6xc" "Pod" "ssh-6ff4cf85f-zh4bk" "Normal" "Created" "Created container ssh"
5823531 1 1 "akash18ga02jzaq8cw52anyhzkwta5wygufgu6zsz6xc" "Pod" "ssh-6ff4cf85f-zh4bk" "Normal" "Started" "Started container ssh"
5823531 1 1 "akash18ga02jzaq8cw52anyhzkwta5wygufgu6zsz6xc" "ReplicaSet" "ssh-6ff4cf85f" "Normal" "SuccessfulCreate" "Created pod: ssh-6ff4cf85f-gt6t2"
5823531 1 1 "akash18ga02jzaq8cw52anyhzkwta5wygufgu6zsz6xc" "ReplicaSet" "ssh-6ff4cf85f" "Normal" "SuccessfulCreate" "Created pod: ssh-6ff4cf85f-zh4bk"
5823531 1 1 "akash18ga02jzaq8cw52anyhzkwta5wygufgu6zsz6xc" "Deployment" "ssh" "Normal" "ScalingReplicaSet" "Scaled up replica set ssh-6ff4cf85f to 1"
I've asked the provider for kubectl get pods -A -o wide output, but he is Away..
Shortly before I've asked him, he said that he's got some deployment that is still Terminating.
He was testing storage speed with Chia deployment and closed the lease, but it is still running:
# kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
...
...
rgvf9cu3vacspjp0o9hdn8q4hc1pno1k7g2u8mjgrh3ha chia-65b7fc4d96-62px7 1/1 Terminating 0 163m
rgvf9cu3vacspjp0o9hdn8q4hc1pno1k7g2u8mjgrh3ha chia-65b7fc4d96-v85qr 1/1 Running 0 39m
Having that the namespace is same, there must be some issue on his side.
Evidence (Akash.Pro)
This is my provider
$ curl -s -k https://provider.akash.pro:8443/version | jq
{
"akash": {
"version": "v0.16.4-rc0",
"commit": "38b82258c14e3d0a2ed3d15a8d4140ec8c826a84",
"buildTags": "\"osusergo,netgo,ledger,static_build\"",
"go": "go version go1.17.6 linux/amd64",
"cosmosSdkVersion": "v0.45.1"
},
"kube": {
"major": "1",
"minor": "23",
"gitVersion": "v1.23.6",
"gitCommit": "ad3338546da947756e8a88aa6822e9c11e7eac22",
"gitTreeState": "clean",
"buildDate": "2022-04-14T08:43:11Z",
"goVersion": "go1.17.9",
"compiler": "gc",
"platform": "linux/amd64"
}
}
$ cat lease-events.3 | jq -r '[(.lease_id | .dseq, .gseq, .oseq, .provider), (.object | .kind, .name), .type, .reason, .note] | @csv' | column -t -s","
5823715 1 1 "akash1nxq8gmsw2vlz3m68qvyvcf3kh6q269ajvqw6y0" "Ingress" "7efc7i47i9euj7laotatgtpt7c.ingress.akash.pro" "Normal" "Sync" "Scheduled for sync"
5823715 1 1 "akash1nxq8gmsw2vlz3m68qvyvcf3kh6q269ajvqw6y0" "Pod" "ssh-79cc8d4674-2hqht" "Normal" "Scheduled" "Successfully assigned ddqp0svbeqjnkiicq5d53c3dfduo83cm03b14btomvgsc/ssh-79cc8d4674-2hqht to node1"
5823715 1 1 "akash1nxq8gmsw2vlz3m68qvyvcf3kh6q269ajvqw6y0" "Pod" "ssh-79cc8d4674-2hqht" "Normal" "Pulled" "Container image ""ubuntu:21.10"" already present on machine"
5823715 1 1 "akash1nxq8gmsw2vlz3m68qvyvcf3kh6q269ajvqw6y0" "Pod" "ssh-79cc8d4674-2hqht" "Normal" "Created" "Created container ssh"
5823715 1 1 "akash1nxq8gmsw2vlz3m68qvyvcf3kh6q269ajvqw6y0" "Pod" "ssh-79cc8d4674-2hqht" "Normal" "Started" "Started container ssh"
5823715 1 1 "akash1nxq8gmsw2vlz3m68qvyvcf3kh6q269ajvqw6y0" "Pod" "ssh-79cc8d4674-zszts" "Normal" "Scheduled" "Successfully assigned ddqp0svbeqjnkiicq5d53c3dfduo83cm03b14btomvgsc/ssh-79cc8d4674-zszts to node1"
5823715 1 1 "akash1nxq8gmsw2vlz3m68qvyvcf3kh6q269ajvqw6y0" "Pod" "ssh-79cc8d4674-zszts" "Normal" "Pulled" "Container image ""ubuntu:21.10"" already present on machine"
5823715 1 1 "akash1nxq8gmsw2vlz3m68qvyvcf3kh6q269ajvqw6y0" "Pod" "ssh-79cc8d4674-zszts" "Normal" "Created" "Created container ssh"
5823715 1 1 "akash1nxq8gmsw2vlz3m68qvyvcf3kh6q269ajvqw6y0" "Pod" "ssh-79cc8d4674-zszts" "Normal" "Started" "Started container ssh"
5823715 1 1 "akash1nxq8gmsw2vlz3m68qvyvcf3kh6q269ajvqw6y0" "Pod" "ssh-79cc8d4674-zszts" "Warning" "Evicted" "Container ssh exceeded its local ephemeral storage limit ""1073741824"". "
5823715 1 1 "akash1nxq8gmsw2vlz3m68qvyvcf3kh6q269ajvqw6y0" "Pod" "ssh-79cc8d4674-zszts" "Normal" "Killing" "Stopping container ssh"
5823715 1 1 "akash1nxq8gmsw2vlz3m68qvyvcf3kh6q269ajvqw6y0" "ReplicaSet" "ssh-79cc8d4674" "Normal" "SuccessfulCreate" "Created pod: ssh-79cc8d4674-zszts"
5823715 1 1 "akash1nxq8gmsw2vlz3m68qvyvcf3kh6q269ajvqw6y0" "ReplicaSet" "ssh-79cc8d4674" "Normal" "SuccessfulCreate" "Created pod: ssh-79cc8d4674-2hqht"
5823715 1 1 "akash1nxq8gmsw2vlz3m68qvyvcf3kh6q269ajvqw6y0" "Deployment" "ssh" "Normal" "ScalingReplicaSet" "Scaled up replica set ssh-79cc8d4674 to 1"
root@node1:~# kubectl get pods -A -o wide | grep ssh
ddqp0svbeqjnkiicq5d53c3dfduo83cm03b14btomvgsc ssh-79cc8d4674-zszts 1/1 Running 0 27s 10.233.90.30 node1 <none> <none>
root@node1:~# kubectl get pods -A -o wide | grep ssh
ddqp0svbeqjnkiicq5d53c3dfduo83cm03b14btomvgsc ssh-79cc8d4674-2hqht 0/1 ContainerCreating 0 0s <none> node1 <none> <none>
ddqp0svbeqjnkiicq5d53c3dfduo83cm03b14btomvgsc ssh-79cc8d4674-zszts 0/1 Completed 0 33s 10.233.90.30 node1 <none> <none>
root@node1:~# kubectl get pods -A -o wide | grep ssh
ddqp0svbeqjnkiicq5d53c3dfduo83cm03b14btomvgsc ssh-79cc8d4674-2hqht 1/1 Running 0 2s 10.233.90.31 node1 <none> <none>
ddqp0svbeqjnkiicq5d53c3dfduo83cm03b14btomvgsc ssh-79cc8d4674-zszts 0/1 Completed 0 35s 10.233.90.30 node1 <none> <none>
I'm confused that we can't seem to reproduce this across all providers uniformly at this point. Do we know if there are any differences in configuration between those?
There is a workaround per @boz , making it sev2