Rohan Varma comments

Results 53 comments of


                                            Rohan Varma

FSDP - TypeError: load_state_dict() got an unexpected keyword argument 'strict'

Thanks for raising this issue! I responded in PT: https://github.com/pytorch/pytorch/issues/82963. Although, not sure if HF uses nightlies/latest PT or a stable version. If we can't get pytorch updated in HF...

FSDP namedtuple support

@pytorchbot merge

why the gLoss increasing when training

You can think of the generator and the discriminator as playing a game against each other in which they seek to get "better" (i.e., minimize their respective losses) at the...

Support uneven DDP inputs with pytorch model.join

Is there any progress on this issue? Happy to help in any way.

Support uneven DDP inputs with pytorch model.join

@edenlightning Sounds good, I also pinged the slack channel for any feedback/discussions.

Support uneven DDP inputs with pytorch model.join

The PR https://github.com/PyTorchLightning/pytorch-lightning/pull/5141 is ready for review, in case anyone wants to take a look.

FSDP - TypeError: load_state_dict() got an unexpected keyword argument 'strict'

This should be fixed in PyTorch nightly now: https://github.com/pytorch/pytorch/pull/83309

fp32 Full Training seems to be taking a lot of memory

Is this with the default configs @kartikayk, or are you setting a higher batch size which could contribute to activation memory?

File issues with 70B Lora setup

@BedirT Thanks for filing this issue! So yeah as @RdoubleA mentioned, please run the `tune download` command with the `--ignore-patterns` flag added (this is mentioned in the [config](https://github.com/pytorch/torchtune/blob/main/recipes/configs/llama3/70B_lora.yaml#L6) as well...

KVCache might have issues w/bfloat16 inference

@kartikayk To clarify, are we considering removing kv cache entirely or refactoring the implementation to be less intrusive? We would want to keep some implementation of kv cache for efficient...