Rohan Varma

Results 51 issues of Rohan Varma

Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * __->__ #83055 * #83035 * #82892 - NamedTuple support is blocking MultiModal adoption. TODO: add test

oncall: distributed
cla signed

### 🐛 Describe the bug Sometimes, _post_backward_hook will not fire if gradients were not accumulated on the FSDP managed parameter, such as if all parameters in an FSDP module were...

high priority
triage review
oncall: distributed
module: fsdp

i.e. different clients can train different models

### 🐛 Describe the bug ``` class M(nn.Module): def __init__(self): super().__init__() self.a = nn.Linear(10, 10) self.b = nn.Linear(10, 10) def forward(self, x): a = self.a(x) b = self.b(x) return (a,...

oncall: distributed
module: ddp

### 🚀 The feature Should add some tests to ensure the right sharded grad scaler, no_sync ctx manager, etc is picked out when using composable FSDP ### Motivation, pitch ....

The API enforces that the wrapping policy just be a set of modules, which is sufficient for a few use cases but the underlying API offers more generality in terms...

https://github.com/pytorch/torchtune/pull/779 is adding QLoRA-13B, but we need to add CI for this as well.

This will save memory for GQA / MQA, but will require a bit of refactor to attention forward pass.

#### Context - In this PR, we introduce `TunePerfMonitor`, a utility class for tracking metrics across training. This class is meant to be flexible in the actual metrics that users...

CLA Signed