Include container resource limits on kubernetes.
New feature
Include container resource limits when requesting pods from a kubernetes cluster for a workflow.
Usage scenario
Allows nextflow to work on managed kubernetes clusters that require container resource limits to be included when requesting pods.
Suggest implementation
Optionally allow adding container resource limits to pod requests. Maybe here? I don't know the nextflow codebase very well. This is something that nextflow needs to submit when requesting the pod: https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#example-1
Discussion
Unless I'm misunderstanding, this is a feature that nextflow does not have that is preventing me from using it on our shared cluster.
My toy workflow is here: https://github.com/DailyDreaming/k8-nextflow/tree/master
This workflow runs fine locally, but not on our kubernetes cluster. I use this command for kubernetes:
./nextflow kuberun https://github.com/DailyDreaming/k8-nextflow -v chaos-vol:/workspace --profile braingeneers
Where chaos-vol is a shared network PVC .
I get this error:
ERROR ~ Request POST /apis/batch/v1/namespaces/braingeneers/jobs returned an error code=403
{
"kind": "Status",
"apiVersion": "v1",
"metadata": {
},
"status": "Failure",
"message": "admission webhook \"validation.gatekeeper.sh\" denied the request: [container-must-have-limits-and-requests] container <drunk-meitner> does not have <{\"cpu\", \"memory\"}> requests defined\n[container-must-have-limits-and-requests] container <drunk-meitner> does not have <{\"memory\"}> limits defined",
"reason": "Forbidden",
"code": 403
}
I think that this should be something added to the Kubernetes request for the pod, as outlined above in Suggest implementation.
Thanks for your time!
Hey Lon, your process definition is incorrect. Just remove the resources: and it should work:
process pim {
container 'ubuntu:22.04'
cpus '1'
memory '100 MB'
// disk '100 MB'
// ...
}
Hey Lon, your process definition is incorrect. Just remove the
resources:and it should work:process pim { container 'ubuntu:22.04' cpus '1' memory '100 MB' // disk '100 MB' // ... }
@bentsherman I've made the changes you suggested but it still gives the same error. Any suggestions?
Is it the exact same message? The original one was:
admission webhook "validation.gatekeeper.sh" denied the request: [container-must-have-limits-and-requests] container
does not have <{"cpu", "memory"}> requests defined\n[container-must-have-limits-and-requests] container does not have <{"memory"}> limits defined
Or is it just that the cpu limit is missing?
Is it the exact same message? The original one was:
admission webhook "validation.gatekeeper.sh" denied the request: [container-must-have-limits-and-requests] container does not have <{"cpu", "memory"}> requests defined\n[container-must-have-limits-and-requests] container does not have <{"memory"}> limits defined
Or is it just that the cpu limit is missing?
It does seem to be the exact message? Copy-pasting:
ERROR ~ Request POST /apis/batch/v1/namespaces/braingeneers/jobs returned an error code=403
{
"kind": "Status",
"apiVersion": "v1",
"metadata": {
},
"status": "Failure",
"message": "admission webhook \"validation.gatekeeper.sh\" denied the request: [container-must-have-limits-and-requests] container <disturbed-watson> does not have <{\"cpu\", \"memory\"}> requests defined\n[container-must-have-limits-and-requests] container <disturbed-watson> does not have <{\"memory\"}> limits defined",
"reason": "Forbidden",
"code": 403
}
-- Check '.nextflow.log' file for details
This error must be for the head job created by kuberun. You can specify the head job cpus and memory on the command line:
./nextflow kuberun https://github.com/DailyDreaming/k8-nextflow -v chaos-vol:/workspace -profile braingeneers -head-cpus 1 -head-memory 2Gi
This error must be for the head job created by
kuberun. You can specify the head job cpus and memory on the command line:./nextflow kuberun https://github.com/DailyDreaming/k8-nextflow -v chaos-vol:/workspace -profile braingeneers -head-cpus 1 -head-memory 2Gi
Hmmm... I'm not sure if it worked. I'm not sure what pod has unbound immediate PersistentVolumeClaims means, but I'll look into it and see if I didn't set up the PVC improperly?
Thanks again for the help.
(venv) quokka@qcore ~/git/mission_control/nextflow $ ./nextflow kuberun https://github.com/DailyDreaming/k8-nextflow -v chaos-vol:/workspace -head-cpus 1 -head-memory 256Mi
Job submitted: marvelous-thompson .. waiting to start
WARN: K8s pod cannot be scheduled -- 0/324 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/324 nodes are available: 324 Preemption is not helpful for scheduling..
WARN: K8s pod cannot be scheduled -- 0/324 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/324 nodes are available: 324 Preemption is not helpful for scheduling..
WARN: K8s pod cannot be scheduled -- 0/324 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/324 nodes are available: 324 Preemption is not helpful for scheduling..
WARN: K8s pod cannot be scheduled -- 0/324 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/324 nodes are available: 324 Preemption is not helpful for scheduling..
Not sure either, I haven't seen that error before. Could be useful to inspect the yaml that was generated. Should be able to see it with kubectl get pod <pod-name> -o yaml
There might be something wrong with your PVC. It should be bound to a persistent volume and also be ReadWriteMany.
@bentsherman
The pvc.yml:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: whimvol
spec:
storageClassName: rook-cephfs
accessModes:
- ReadWriteMany
resources:
requests:
storage: 5Gi
Created with: kubectl apply -f pvc.yml with the yml template here: https://docs.nationalresearchplatform.org/userdocs/storage/intro/
Status:
(venv) quokka@qcore ~/$ kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
whimvol Bound pvc-955cf373-946b-4d0d-9c52-8a0d1bd2c37e 5Gi RWX rook-cephfs 3m39s
I tried running it without a pod specified, and it gave the same error. Currently attempting to mount it into a pod and see if the next run will pick up on that pod existing and use it.
Creating the pod may have worked, as it scheduled/queued the workflow and attempted to launch. Running into some standard permission issues right now cannot create resource \"jobs\" in API group \"batch\" in the namespace \"braingeneers\". Just posting an update, and will follow up when I've addressed the permissions issues.
@DailyDreaming @bentsherman I see similar error. It works fine in 22.10.7 (applying limits on cpu correctly), but doesn't work in 23.04.1
Yes we need to merge #3027 to fix this issue