Hung Nguyen

Results 8 comments of Hung Nguyen

Hi, I am also looking for this feature, especially Pytorch, the PR for it seems pausing for some time https://github.com/kubeflow/pipelines/pull/5170 I could run the distributed training using PytorchJob (created by...

@wangli1426 The simple example of ResourceOp: https://github.com/kubeflow/pipelines/blob/master/samples/core/resource_ops/resource_ops.py For the PytorchJob: https://github.com/kubeflow/pytorch-operator/blob/master/examples/mnist/v1/pytorch_job_mnist_nccl.yaml You can make it as json code Remember to set on_success_condition, example: `success_condition='status.replicaStatuses.Worker.succeeded==3,status.replicaStatuses.Chief.succeeded==1'` https://github.com/kubeflow/pipelines/blob/master/samples/contrib/e2e-mnist/mnist-pipeline.ipynb

I only know they have the client sdk to get logs Example: https://github.com/kubeflow/pytorch-operator/blob/4aeb6503162465766476519339d3285f75ffe03e/sdk/python/examples/kubeflow-pytorchjob-sdk.ipynb API: https://github.com/kubeflow/pytorch-operator/blob/master/sdk/python/docs/PyTorchJobClient.md#get_logs But I don't know how to show the logs to a component of pipeline.

> > But I don't know how to show the logs to a component of pipeline. > > You could just print them. I am using k8s_client API (Watch and...

I also need this ONNX preprocess Discretization --> IntegerLookup As far as I've looked around, ONNx has only enabled string lookup not integer look up If we have a way...

Hi I did try cast_int_to_str + StringLookup to replace IntegerLookup and I was able to convert the model to ONNX However, when I check the performance, the TF model is...

After each iteration == a batch, all of the replicas will send out their gradient (size = network size) If model size is 100MB: 1 node: no need to send...