magic-hya

Results 49 comments of magic-hya

[root@kuscia-master-76c5b5bc7b-s84k8 kuscia]# kubectl get cdr -A NAME SOURCE DESTINATION HOST AUTHENTICATION READY alice-bob alice bob kuscia-lite-bob.lite-bob.svc.cluster.local Token True bob-alice bob alice kuscia-lite-alice.lite-alice.svc.cluster.local Token True alice-kuscia-system alice kuscia-system Token True bob-kuscia-system...

目录下没有内容 [root@kuscia-master-76c5b5bc7b-s84k8 stdout]# pwd /home/kuscia/var/stdout [root@kuscia-master-76c5b5bc7b-s84k8 stdout]# ls -l total 0

我想进bob容器看一看的,也报这个错,这错误是不是有关联 [root@kuscia-master-76c5b5bc7b-s84k8 stdout]# kubectl exec -it secretflow-task-20240620111611-single-psi-0 -n bob -- bash Error from server: error dialing backend: proxy error from 0.0.0.0:6443 while dialing 192.168.62.12:10250, code 502: 502 Bad Gateway

``` [root@kuscia-master-76c5b5bc7b-s84k8 stdout]# kubectl describe pod secretflow-task-20240620111611-single-psi-0 -n bob Name: secretflow-task-20240620111611-single-psi-0 Namespace: bob Priority: 0 Service Account: default Node: kuscia-lite-bob-69bd6df646-k8krs/192.168.62.12 Start Time: Thu, 20 Jun 2024 11:17:57 +0800 Labels: kuscia.secretflow/communication-role-client=true...

其实还有个地方我有疑问的,在配置lite的runtime配置runk时,我使用了默认的kubeconfigFile 然后我也没有配置RBAC,不知道是否会影响到 ``` # 当 runtime 为 runk 时配置 runk: # 任务调度到指定的机构 k8s namespace 下 namespace: lite-bob # 机构 k8s 集群的 pod dns 配置, 用于解析节点的应用域名, runk 拉起 pod 所使用的 dns...

``` [root@kuscia-master-76c5b5bc7b-s84k8 stdout]# kubectl logs secretflow-task-20240620111611-single-psi-0 -n bob Error from server: Get "https://192.168.62.12:10250/containerLogs/bob/secretflow-task-20240620111611-single-psi-0/secretflow": proxy error from 0.0.0.0:6443 while dialing 192.168.62.12:10250, code 502: 502 Bad Gateway [root@kuscia-master-76c5b5bc7b-s84k8 stdout]# kubectl get nodes...

启动了rbac后,执行任务,现在开始报错了 ``` kubectl describe pod secretflow-task-20240620172156-single-psi-0 -n alice ... Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling 4m29s kuscia-scheduler 0/2 nodes are available: waiting for...

之前容器内的alice和pod的任务节点卡住了,我就用--force删除了,有可能是这个原因 ``` [root@kuscia-master-76c5b5bc7b-s84k8 kuscia]# kubectl get nodes -o json | jq '.items[] | {name: .metadata.name, taints: .spec.taints}' { "name": "kuscia-lite-bob-69bd6df646-k8krs", "taints": [ { "effect": "NoSchedule", "key": "kuscia.secretflow/agent", "value": "v1" }...

删除污点后,重新启动任务发现2个问题分别是: bob节点 ``` [root@kuscia-master-76c5b5bc7b-s84k8 kuscia]# kubectl get pods -n bob NAME READY STATUS RESTARTS AGE secretflow-task-20240621113741-single-psi-0 0/1 ImagePullBackOff 0 4m39s Events: Type Reason Age From Message ---- ------ ---- ----...

> alice有权限问题:FailedCreatePodSandBox 事件指出创建ConfigMap时存在权限问题。服务账户 lite-alice:default 缺少在 lite-alice 命名空间中创建ConfigMap所需的权限。 ``` curl -X POST 'http://127.0.0.1:8082/api/v1/domaindatagrant/create' \ --cert /home/kuscia/var/certs/kusciaapi-server.crt \ --key /home/kuscia/var/certs/kusciaapi-server.key \ --cacert /home/kuscia/var/certs/ca.crt \ --header "Token: $(cat /home/kuscia/var/certs/token)" \ --header 'Content-Type: application/json'...