ThomaswellY
ThomaswellY
I think the cudatoolkit you downloaded with pytorch is 8.0, you should check out the version of your torch in env, not the whole cuda folder(usually in system folder) downloaded...
It seems static ONNX doesn't support self defined input size?
the same question, the codes seems not supportable for distributed gpus
@alculquicondor @tenzen-y I was looking to how to modify the original script which originally use torch.distributed.launch to start training to use mpirun to start training in mpi-operator.
> @ThomaswellY Can you create an issue on https://github.com/kubeflow/training-operator since the mpi-operator doesn't support v1 API? Thanks for your reply~ the api-resources of my k8s clusters in shown below: (base)...
Thanks for your reply~ I am a little confused about which type of API can support my resource (mpijob in my case). The command "kubectl api-resources" shows mpijobs in my...
@alculquicondor @tenzen-y thanks for your kind help! maybe i should use v2beta1 for deepspeed. Anyway, I have executed #549 successfully even in v1, however it seems only cifar10_deepspeed.py needs no...
HI!@kuaashish Thank you for your response and for pointing out the transition to the Pose Landmarker Task API. I have migrated my implementation to the upgraded API, following the [Python...
@kuaashish if any further details are needed, pls let me know , thanks ~
looking forward to your reply ~ @HuangYG123 @tyshiwo @Luxuff