Results 11 issues of UFO

I have run the following command to test horovod pytorch frame, the error occurs: jovyan@560c5fd869da:~$ mpirun -np 1 -bind-to none -map-by slot -x NCCL_DEBUG=INFO -x LD_LIBRARY_PATH -x PATH -mca pml...

bug

I run the following command to test horovod: horovodrun -np 4 -H localhost:4 python keras_mnist.py the error occurs: 2019-11-15 08:51:09.228813: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0 Open MPI not...

bug

Is the issue already present in https://github.com/cloudera/hue/issues or discussed in the forum https://discourse.gethue.com? Yes. https://github.com/cloudera/hue/issues/2590 Describe the bug: OIDC authentication using Keycloak for example does not work again when Hue...

roadmap

Per to release notes of v2.0, https://github.com/kubeflow/arena/releases/tag/v0.2.0, it has supports multiple users. How can i use this feature? can it be integrated with LDAP/AP?

lifecycle/stale

@cheyang Is there any way to limit the number of GPUs a user can use ? for example, If a cluster has 8 gpu, each time a user submits a...

lifecycle/stale

when i submit a distributed tensorflow job using the command below: # Set the Job Name %env JOB_NAME=tf-distributed-mnist # Submit a training job # using code and data from Data...

lifecycle/stale

@wufan1991 ### Symptom In the getResponse function from [https://github.com/FederatedAI/FATE-Serving/blob/master/fate-serving-common/src/main/java/com/webank/ai/fate/serving/common/utils/HttpAdapterClientPool.java](url), private static HttpAdapterResponse getResponse(HttpRequestBase request) { CloseableHttpResponse response = null; try { response = HttpClientPool.getConnection().execute(request, HttpClientContext.create()); HttpEntity entity = response.getEntity(); String...

传统docker-compose方式QPS200压测时未出现该错误,但是通过helm chart部署后,20分钟左右serving proxy出现“unable to create new native thread",具体见https://github.com/FederatedAI/FATE-Serving/issues/207 @LaynePeng @dylan-fan @owlet42

使用kubefate模拟部署了两个在线服务集群,分别为 guest kubectl get pods -n fate-serving-10005 NAME READY STATUS RESTARTS AGE serving-admin-744f988bc-2mh2l 1/1 Running 0 16h serving-proxy-59957b497d-vztml 1/1 Running 0 16h serving-redis-7fbb959b6c-bxcqt 1/1 Running 0 16h serving-server-65bccf659b-bqd6t 1/1 Running...

I noticed that "Each container can request one or more GPUs " using "nvidia.com/gpu: " as a schedulable resource, (https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/), since the goals of gpushare-scheduler-extender is allowing users to express...