Bernard Han
Bernard Han
Additionally, for the multi-host setting, do we just run these benchmark scripts on the different hosts and aggregate the results before "reportgen"? Or does the script support this setting? I...
@YafeiWangAlice No I didn't resolve it since I just needed the summary and per-epoch stats so didn't need the aggregated "benchmark report" anyways. Since this repo wraps around https://github.com/argonne-lcf/dlio_benchmark/tree/main, I...
@YafeiWangAlice No I didn't resolve it since I just needed the summary and per-epoch stats so didn't need the aggregated "benchmark report" anyways. Since this repo wraps around https://github.com/argonne-lcf/dlio_benchmark/tree/main, I...
+1. One issue here also is that the local queue is a namespaced object and currently is only created under the `default` namespace. As we are aware that [it is...
Per https://kueue.sigs.k8s.io/docs/concepts/cluster_queue/, I would think it's the CPU resources with that resource group so it should roughly 1024*32. It seeks that XPK assigns it [here](https://github.com/google/xpk/blob/main/xpk.py#L4107-L4133). It is probably true when...
Shouldn't "number of CPUs" equal to the number of VMs * the CPUs of each VM? > Is there a problem that we are seeing or is the kueue accepting...
Also https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#meaning-of-cpu.
yep, thanks @RoshaniN ! So it should be equal to number of VMs * the virtual CPUs of each CPU machine in the nominalquota. I guess I was also sure...
yeah 20 fails too. 20000m == 20 so it will be evaluated to equivalent resource.
btw I'm not blocked on this -- I can `kubectl edit` the queue configuration. But this ticket is for future usage of the CPU-only cluster spun up by xpk ;)