Error: Unable to set Type=notify in systemd service file?
I compile gpu-manager to arm64 and run it on jetson nano. However, when I run kubectl create -f gpu-manager.yaml, it shows
copy /usr/local/host/lib/aarch64-linux-gnu/libcuda.so to /usr/local/nvidia/lib
copy /usr/local/host/lib/aarch64-linux-gnu/tegra/libcuda.so to /usr/local/nvidia/lib
copy /usr/local/host/lib/aarch64-linux-gnu/tegra/libcuda.so.1 to /usr/local/nvidia/lib
copy /usr/local/host/lib/aarch64-linux-gnu/tegra/libcuda.so.1.1 to /usr/local/nvidia/lib
copy /usr/local/host/lib/aarch64-linux-gnu/tegra/libnvidia-ptxjitcompiler.so.440.18 to /usr/local/nvidia/lib
copy /usr/local/host/lib/aarch64-linux-gnu/tegra/libnvidia-ptxjitcompiler.so.1 to /usr/local/nvidia/lib
copy /usr/local/host/lib/aarch64-linux-gnu/tegra/libnvidia-fatbinaryloader.so.440.18 to /usr/local/nvidia/lib
copy /usr/local/host/lib/aarch64-linux-gnu/libGL.so.1.0.0 to /usr/local/nvidia/lib
copy /usr/local/host/lib/aarch64-linux-gnu/libGL.so to /usr/local/nvidia/lib
copy /usr/local/host/lib/aarch64-linux-gnu/libGL.so.1 to /usr/local/nvidia/lib
copy /usr/local/host/lib/aarch64-linux-gnu/libGLX.so to /usr/local/nvidia/lib
copy /usr/local/host/lib/aarch64-linux-gnu/libGLX.so.0 to /usr/local/nvidia/lib
copy /usr/local/host/lib/aarch64-linux-gnu/libGLX.so.0.0.0 to /usr/local/nvidia/lib
copy /usr/local/host/lib/aarch64-linux-gnu/libOpenGL.so.0.0.0 to /usr/local/nvidia/lib
copy /usr/local/host/lib/aarch64-linux-gnu/libOpenGL.so to /usr/local/nvidia/lib
copy /usr/local/host/lib/aarch64-linux-gnu/libOpenGL.so.0 to /usr/local/nvidia/lib
copy /usr/local/host/lib/aarch64-linux-gnu/libGLESv1_CM.so.1 to /usr/local/nvidia/lib
copy /usr/local/host/lib/aarch64-linux-gnu/libGLESv1_CM.so.1.0.0 to /usr/local/nvidia/lib
copy /usr/local/host/lib/aarch64-linux-gnu/libGLESv1_CM.so to /usr/local/nvidia/lib
copy /usr/local/host/lib/aarch64-linux-gnu/libGLESv2.so to /usr/local/nvidia/lib
copy /usr/local/host/lib/aarch64-linux-gnu/libGLESv2.so.2.0.0 to /usr/local/nvidia/lib
copy /usr/local/host/lib/aarch64-linux-gnu/libGLESv2.so.2 to /usr/local/nvidia/lib
copy /usr/local/host/lib/chromium-browser/swiftshader/libGLESv2.so to /usr/local/nvidia/lib
copy /usr/local/host/lib/chromium-browser/libGLESv2.so to /usr/local/nvidia/lib
copy /usr/local/host/lib/aarch64-linux-gnu/libEGL.so to /usr/local/nvidia/lib
copy /usr/local/host/lib/aarch64-linux-gnu/libEGL.so.1 to /usr/local/nvidia/lib
copy /usr/local/host/lib/aarch64-linux-gnu/libEGL.so.1.0.0 to /usr/local/nvidia/lib
copy /usr/local/host/lib/chromium-browser/libEGL.so to /usr/local/nvidia/lib
copy /usr/local/host/lib/chromium-browser/swiftshader/libEGL.so to /usr/local/nvidia/lib
copy /usr/local/host/lib/aarch64-linux-gnu/libGLdispatch.so to /usr/local/nvidia/lib
copy /usr/local/host/lib/aarch64-linux-gnu/libGLdispatch.so.0 to /usr/local/nvidia/lib
copy /usr/local/host/lib/aarch64-linux-gnu/libGLdispatch.so.0.0.0 to /usr/local/nvidia/lib
copy /usr/local/host/lib/aarch64-linux-gnu/tegra/libGLX_nvidia.so.0 to /usr/local/nvidia/lib
copy /usr/local/host/lib/aarch64-linux-gnu/tegra-egl/libEGL_nvidia.so.0 to /usr/local/nvidia/lib
copy /usr/local/host/lib/aarch64-linux-gnu/tegra-egl/libGLESv2_nvidia.so.2 to /usr/local/nvidia/lib
copy /usr/local/host/lib/aarch64-linux-gnu/tegra-egl/libGLESv1_CM_nvidia.so.1 to /usr/local/nvidia/lib
copy /usr/local/host/lib/aarch64-linux-gnu/tegra/libnvidia-eglcore.so.32.5.1 to /usr/local/nvidia/lib
copy /usr/local/host/lib/aarch64-linux-gnu/tegra/libnvidia-egl-wayland.so to /usr/local/nvidia/lib
copy /usr/local/host/lib/aarch64-linux-gnu/tegra/libnvidia-egl-wayland.so.1 to /usr/local/nvidia/lib
copy /usr/local/host/lib/aarch64-linux-gnu/tegra/libnvidia-glcore.so.32.5.1 to /usr/local/nvidia/lib
copy /usr/local/host/lib/aarch64-linux-gnu/tegra/libnvidia-tls.so.32.5.1 to /usr/local/nvidia/lib
copy /usr/local/host/lib/aarch64-linux-gnu/tegra/libnvidia-glsi.so.32.5.1 to /usr/local/nvidia/lib
rebuild ldcache
launch gpu manager
E0412 01:51:13.374667 32218 server.go:133] Unable to set Type=notify in systemd service file?
According to 7#issue and 40#issue, I modify the yaml file and ensure docker runtime is runc not nvidia-container-runtime. This is my yaml file:
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: gpu-manager-daemonset
namespace: kube-system
spec:
updateStrategy:
type: RollingUpdate
selector:
matchLabels:
name: gpu-manager-ds
template:
metadata:
# This annotation is deprecated. Kept here for backward compatibility
# See https://kubernetes.io/docs/tasks/administer-cluster/guaranteed-scheduling-critical-addon-pods/
annotations:
scheduler.alpha.kubernetes.io/critical-pod: ""
labels:
name: gpu-manager-ds
spec:
serviceAccount: gpu-manager
tolerations:
# This toleration is deprecated. Kept here for backward compatibility
# See https://kubernetes.io/docs/tasks/administer-cluster/guaranteed-scheduling-critical-addon-pods/
- key: CriticalAddonsOnly
operator: Exists
- key: tencent.com/vcuda-core
operator: Exists
effect: NoSchedule
# Mark this pod as a critical add-on; when enabled, the critical add-on
# scheduler reserves resources for critical add-on pods so that they can
# be rescheduled after a failure.
# See https://kubernetes.io/docs/tasks/administer-cluster/guaranteed-scheduling-critical-addon-pods/
priorityClassName: "system-node-critical"
# only run node has gpu device
nodeSelector:
nvidia-device-enable: enable
hostPID: true
containers:
- image: myimage/gpu-manager:latest
imagePullPolicy: Always
name: gpu-manager
securityContext:
privileged: true
ports:
- containerPort: 5678
volumeMounts:
- name: device-plugin
mountPath: /var/lib/kubelet/device-plugins
- name: vdriver
mountPath: /etc/gpu-manager/vdriver
- name: vmdata
mountPath: /etc/gpu-manager/vm
- name: log
mountPath: /var/log/gpu-manager
- name: checkpoint
mountPath: /etc/gpu-manager/checkpoint
- name: run-dir
mountPath: /var/run
- name: cgroup
mountPath: /sys/fs/cgroup
readOnly: true
- name: usr-directory
mountPath: /usr/local/host
readOnly: true
- name: kube-root
mountPath: /root/.kube
readOnly: true
env:
- name: LOG_LEVEL
value: "5"
- name: EXTRA_FLAGS
value: "--logtostderr=false --cgroup-driver=systemd"
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
volumes:
- name: device-plugin
hostPath:
type: Directory
path: /var/lib/kubelet/device-plugins
- name: vmdata
hostPath:
type: DirectoryOrCreate
path: /etc/gpu-manager/vm
- name: vdriver
hostPath:
type: DirectoryOrCreate
path: /etc/gpu-manager/vdriver
- name: log
hostPath:
type: DirectoryOrCreate
path: /etc/gpu-manager/log
- name: checkpoint
hostPath:
type: DirectoryOrCreate
path: /etc/gpu-manager/checkpoint
# We have to mount the whole /var/run directory into container, because of bind mount docker.sock
# inode change after host docker is restarted
- name: run-dir
hostPath:
type: Directory
path: /var/run
- name: cgroup
hostPath:
type: Directory
path: /sys/fs/cgroup
# We have to mount /usr directory instead of specified library path, because of non-existing
# problem for different distro
- name: usr-directory
hostPath:
type: Directory
path: /usr
- name: kube-root
hostPath:
type: Directory
path: /root/.kube
I copy the .kube directory in master node to each work node. How can I deal with this error
哥们这个 问题解决了吗
哥们这个 问题解决了吗
@phoenixwu0229 还没有
我在openshift4上也遇到这个问题,我按照faq说明修改了container-runtime-endpoint以及cgroup为systemd
- name: EXTRA_FLAGS #value: "--logtostderr=false" value: "--logtostderr=false --container-runtime-endpoint=/var/run/crio/crio.sock --cgroup-driver=systemd"
然后容器启动就报错: rebuild ldcache launch gpu manager E0516 02:59:32.771447 1270729 server.go:131] Unable to set Type=notify in systemd service file? F0516 02:59:33.872799 1270729 tree.go:102] Can not initialize nvidia tree, err no input goroutine 10 [running]: k8s.io/klog.stacks(0xc000109c00, 0xc000016000, 0x58, 0x193) /go/pkg/mod/k8s.io/[email protected]/klog.go:875 +0xb8 k8s.io/klog.(*loggingT).output(0x27ae5a0, 0xc000000003, 0xc0001c0230, 0x250db7f, 0x7, 0x66, 0x0) /go/pkg/mod/k8s.io/[email protected]/klog.go:826 +0x330 k8s.io/klog.(*loggingT).printf(0x27ae5a0, 0x3, 0x17d4c8c, 0x26, 0xc0003ebe30, 0x1, 0x1) /go/pkg/mod/k8s.io/[email protected]/klog.go:707 +0x14b k8s.io/klog.Fatalf(...) /go/pkg/mod/k8s.io/[email protected]/klog.go:1276 tkestack.io/gpu-manager/pkg/device/nvidia.(*NvidiaTree).Init(0xc0001c6140, 0x0, 0x0) /root/rpmbuild/BUILD/gpu-manager-1.1.5/pkg/device/nvidia/tree.go:102 +0x128 tkestack.io/gpu-manager/pkg/server.(*managerImpl).Run(0xc00004a7c0, 0xc000136dc0, 0x0) /root/rpmbuild/BUILD/gpu-manager-1.1.5/pkg/server/server.go:171 +0x66b created by tkestack.io/gpu-manager/cmd/manager/app.Run /root/rpmbuild/BUILD/gpu-manager-1.1.5/cmd/manager/app/app.go:83 +0x3da
我在openshift4上也遇到这个问题,我按照faq说明修改了container-runtime-endpoint以及cgroup为systemd
- name: EXTRA_FLAGS #value: "--logtostderr=false" value: "--logtostderr=false --container-runtime-endpoint=/var/run/crio/crio.sock --cgroup-driver=systemd"
然后容器启动就报错: rebuild ldcache launch gpu manager E0516 02:59:32.771447 1270729 server.go:131] Unable to set Type=notify in systemd service file? F0516 02:59:33.872799 1270729 tree.go:102] Can not initialize nvidia tree, err no input goroutine 10 [running]: k8s.io/klog.stacks(0xc000109c00, 0xc000016000, 0x58, 0x193) /go/pkg/mod/k8s.io/[email protected]/klog.go:875 +0xb8 k8s.io/klog.(*loggingT).output(0x27ae5a0, 0xc000000003, 0xc0001c0230, 0x250db7f, 0x7, 0x66, 0x0) /go/pkg/mod/k8s.io/[email protected]/klog.go:826 +0x330 k8s.io/klog.(*loggingT).printf(0x27ae5a0, 0x3, 0x17d4c8c, 0x26, 0xc0003ebe30, 0x1, 0x1) /go/pkg/mod/k8s.io/[email protected]/klog.go:707 +0x14b k8s.io/klog.Fatalf(...) /go/pkg/mod/k8s.io/[email protected]/klog.go:1276 tkestack.io/gpu-manager/pkg/device/nvidia.(*NvidiaTree).Init(0xc0001c6140, 0x0, 0x0) /root/rpmbuild/BUILD/gpu-manager-1.1.5/pkg/device/nvidia/tree.go:102 +0x128 tkestack.io/gpu-manager/pkg/server.(*managerImpl).Run(0xc00004a7c0, 0xc000136dc0, 0x0) /root/rpmbuild/BUILD/gpu-manager-1.1.5/pkg/server/server.go:171 +0x66b created by tkestack.io/gpu-manager/cmd/manager/app.Run /root/rpmbuild/BUILD/gpu-manager-1.1.5/cmd/manager/app/app.go:83 +0x3da
Try to install the NVIDIA GPU driver first.
same problem , v1.9.0
gpu-manager 1.0.9 & v1.1.5
请问,在jetson上跑起来了吗?我也遇到这个问题了