Kevin Pouget
Kevin Pouget
Hello, I am unable to run MPIJobs with the current `latest` container image: ``` $ oc logs pod/mpi-mesher-launcher-jx64z + POD_NAME=mpi-mesher-worker-0 + '[' m = - ']' + shift + /opt/kube/kubectl...
Hello, I am confused by the meaning of the `slotPerWorker` [attribute](https://github.com/kubeflow/mpi-operator/blob/master/pkg/apis/kubeflow/v1alpha2/types.go): ``` type MPIJobSpec struct { // Specifies the number of slots per worker used in hostfile. // Defaults to...
**Describe the bug** - I transferred a (lengthy ?) storypack to the Lunii (2.x firmware, brand new), - I think STUdio marked the transfer as finished in the logs -...
**Describe the bug** ``` Could not locate device partition ``` **To Reproduce** Plug a recent Lunii (received it today from SAV, brand new) **Expected behavior** A clear and concise description...
When I create an AppWrapper in one namespace, it works as expected. But when I try to create the same AppWrappers in another namespace (with all the resource namespaces properly...
### Name of Feature or Improvement Get the `helm` deployer in a decent state to ease the deployment of the `main` branch ### Description of Problem the Feature Should Solve...
As part of the MCAD load test, I created 1000 AppWrappers _not_ fitting into the cluster (they request a high amount of CPU). Once all of these AppWrappers are in...
When looking at the MCAD logs, I see that it is constantly being throttled, and it seems to be requesting all the CRDs available in the cluster: ``` I0626 13:23:58.178716...
As part of the MCAD load test that we performed, we observed a significant difference between how the default scheduler and MCAD schedule workload on the Pods. [This plot](https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift-psap_ci-artifacts/846/pull-ci-openshift-psap-ci-artifacts-main-codeflare-e2e/1681197351064047616/artifacts/e2e/test/artifacts/000__test-case_cpu_light_all_schedulable/000__mcad_load_test_multiple_values/expe/aw.count=150_aw.job.job_mode=False_20230718_0836.2b2d/002__plots/report_00_report:_error_report.html) shows...
As part of my automated Codeflare testing, I'm hitting [this exception](https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift-psap_ci-artifacts/866/pull-ci-openshift-psap-ci-artifacts-main-codeflare-e2e/1684093966636552192/artifacts/e2e/test/artifacts/000__sdk_user_run_many/000__local_ci__run_multi/ci-pods_artifacts/ci-pod-1/run.log): ``` Traceback (most recent call last): File "/opt/ci-artifacts/src/testing/codeflare/test.py", line 180, in sys.exit(main()) File "/opt/ci-artifacts/src/testing/codeflare/test.py", line 175, in main fire.Fire(Entrypoint())...