UFO issues

Results 11 issues of

UFO

ImportError: Extension horovod.torch has not been built

I have run the following command to test horovod pytorch frame, the error occurs: jovyan@560c5fd869da:~$ mpirun -np 1 -bind-to none -map-by slot -x NCCL_DEBUG=INFO -x LD_LIBRARY_PATH -x PATH -mca pml...

bug

Open MPI not found in output of mpirun --version.

I run the following command to test horovod: horovodrun -np 4 -H localhost:4 python keras_mnist.py the error occurs: 2019-11-15 08:51:09.228813: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0 Open MPI not...

bug

Unable to logout when using keycloak with OIDC

Is the issue already present in https://github.com/cloudera/hue/issues or discussed in the forum https://discourse.gethue.com? Yes. https://github.com/cloudera/hue/issues/2590 Describe the bug: OIDC authentication using Keycloak for example does not work again when Hue...

roadmap

how can i use"muti-user" feature in arena?

Per to release notes of v2.0, https://github.com/kubeflow/arena/releases/tag/v0.2.0, it has supports multiple users. How can i use this feature? can it be integrated with LDAP/AP?

lifecycle/stale

Limit the number of GPUs a user can use

@cheyang Is there any way to limit the number of GPUs a user can use ? for example, If a cluster has 8 gpu, each time a user submits a...

lifecycle/stale

CreateSession still waiting for response from worker

when i submit a distributed tensorflow job using the command below: # Set the Job Name %env JOB_NAME=tf-distributed-mnist # Submit a training job # using code and data from Data...

lifecycle/stale

HttpAdapter getResponse return null, and HttpAdapter is duplicated with HttpAdapterByHeader

@wufan1991 ### Symptom In the getResponse function from [https://github.com/FederatedAI/FATE-Serving/blob/master/fate-serving-common/src/main/java/com/webank/ai/fate/serving/common/utils/HttpAdapterClientPool.java](url), private static HttpAdapterResponse getResponse(HttpRequestBase request) { CloseableHttpResponse response = null; try { response = HttpClientPool.getConnection().execute(request, HttpClientContext.create()); HttpEntity entity = response.getEntity(); String...

kubefate serving proxy: unable to create new native thread

传统docker-compose方式QPS200压测时未出现该错误，但是通过helm chart部署后，20分钟左右serving proxy出现“unable to create new native thread"，具体见https://github.com/FederatedAI/FATE-Serving/issues/207 @LaynePeng @dylan-fan @owlet42

unable to create new native thread

使用kubefate模拟部署了两个在线服务集群，分别为 guest kubectl get pods -n fate-serving-10005 NAME READY STATUS RESTARTS AGE serving-admin-744f988bc-2mh2l 1/1 Running 0 16h serving-proxy-59957b497d-vztml 1/1 Running 0 16h serving-redis-7fbb959b6c-bxcqt 1/1 Running 0 16h serving-server-65bccf659b-bqd6t 1/1 Running...

does one container can request two GPUs?

I noticed that "Each container can request one or more GPUs " using "nvidia.com/gpu: " as a schedulable resource, (https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/), since the goals of gpushare-scheduler-extender is allowing users to express...