Aznix07

Results 10 comments of Aznix07

Hi @shahelaojieraozhi thank you for reporting this issue! How was this LoRA adapter trained? - Training script or framework used? - Were there any warnings during training? - Also can...

I see a few potential workarounds: 1. Flatten the tuple before passing to checkpointed blocks and reconstruct inside 2. Create a custom tensor-like wrapper class that checkpoint can recognize 3....

Nice solution! Moving the tensor splitting inside the checkpointed blocks is elegant - it keeps the tuple structure while avoiding the checkpoint detection issue. I noticed you mentioned the splitting...

Hi @AamodThakur, I have a few questions: 1. What are the model sizes you're using? 2. Does the issue occur if both models are on the same device from the...

Hi @AamodThakur, Thank you for the trace files. I've started analyzing them and the pattern is very revealing. ### Initial Trace Analysis I can see a clear performance degradation pattern:...

Hi @AamodThakur, Excellent work, this clarifies a lot. **Re: Why is backward pass in transformers slow when the issue is in TRL** Great question. Here's what happening: TRL GKDTrainer computes...

Hi @AamodThakur, You found the root cause! ### Summary The performance regression is caused by **`gradient_checkpointing=True`** (new default in v0.22.0+) combined with your multi-device setup. ### Why this happens **Gradient...

### Answering your question > Do you have any idea why the time increased only when models are on different GPU? Was the time taken due to gradient_checkpointing=True when models...

Yeah you are absolutely correct. Your logic is sound - the teacher model should: - Run once with `torch.no_grad()` during forward pass - Have logits detached from the computation graph...

### Solutions **Option 1: Hide the second GPU** ```python import os os.environ["CUDA_VISIBLE_DEVICES"] = "0" # only make cuda:0 visible # then initialize normally t_model = AutoModelForCausalLM.from_pretrained(...) s_model = AutoModelForCausalLM.from_pretrained(...) ```...