vikram71198
vikram71198
Gotcha. I explicitly pip install `torch == 2.2.1+cu118` (`torch == 2.2.2+cu121` is the default torch which I attempt to override), so another part of `ds_report` that I find confounding is...
Okay, I fixed this myself. Nvm.
I'm facing the same issue as @JakobLS. After the first epoch, I get the message ```Invalidate trace cache @ step 0: expected module 0, but got module 456``` and then...
Hi, is there any progress on this issue thus far? I'm really hoping this feature's released soon.
Yeah, it would be nice to get support for the DeBERTa architecture as well. @OlivierDehaene is this model architecture on the roadmap? can we maybe get an idea of the...
Yep, can confirm I also see the same issue with LLaMA-3-8b-Instruct with FSDP + Gradient Checkpointing. The Yi series of models also have this issue, I just checked. And it...
@tomaarsen any updates on this? I see you haven't responded yet.
@mlabonne I would heavily suggest SFT-ing [this](https://huggingface.co/Nexusflow/NexusRaven-V2-13B) newly released model by Nexusflow, which surpasses GPT-4 according to their evals. There is a discussion thread on the page [here](https://huggingface.co/Nexusflow/NexusRaven-V2-13B/discussions/3), where they...
@ArthurZucker are there any updates on this? I don't see a PR for this yet.
@ArthurZucker there's a slightly different error this time around with ```transformers==4.41.0``` ```python LlamaDecoderLayer.forward() got an unexpected keyword argument 'offload_to_cpu' ``` There's something going on here.