anj-s
anj-s
**Describe the bug** I am unable to get distributed training running with PyTorch backend. I am consistently running into the RDMA_CM_EVENT_ADDR_ERROR. Can someone take a look and let me know...
This error is thrown when using torch.nn.CrossEntropyLoss() with SPMD API.
Currently MNIST benchmark fails due to unsupported convolution ops in the DTensor registry. Error: NotImplementedError: Operator aten.convolution.default does not have a DistributedTensor rule registered.
tanh is part of the dtensor_lagging_op_db but is also in xfail() When I adding support for tanh I don't remember making any modifications other than tests. Figure out what is...
Add support for backward() in test_dtensor_ops.py since that will cover FW + BW.
### What happened? A wall if thought output is received but there is no part.text after . All I see is: ### What did you expect to happen? Thought completes...
Edit tool is one of the most critical pieces and invariably there are issues with editing files (especially large files). Evaluate the quality of the current edit tool , track...
## What problem does this solve? This feature will evaluate model reasoning to identify areas of improvement. ## How will it work? By implementing a systematic evaluation process, we'll measure...
## What problem does this solve? This issue tracks all efforts that we are embarking onto help improve the AI related quality of the gemini-cli Produict. Our goal is to...