Results 20 comments of Benjamin Marie

Yes, it works. The requirements must be updated. I expect new installations of Axolotl to be impossible until it's fixed.

Here is my model (Llama 3.1 8B): ``` PeftModelForCausalLM( (base_model): XLoraModel( (lora_model): LoraModel( (model): LlamaForCausalLM( (model): LlamaModel( (embed_tokens): Embedding(128256, 4096) (layers): ModuleList( (0-31): 32 x LlamaDecoderLayer( (self_attn): LlamaFlashAttention2( (q_proj): lora.Linear(...

I added this code: ``` print(xlora_model.print_trainable_parameters()) print("--- Require grad? ----") for name, param in model.named_parameters(): if param.requires_grad: print(name) print("----------------------") ``` It prints: ``` trainable params: 118,372,800 || all params: 8,148,634,048...

Same issue with the GPTQ versions of Qwen3-30B-A3B.

The "import from intel_extension_for_transformers.transformers.modeling import AutoModelForCausalLM" also fails with the same error message. But my installation command is much simpler: pip install intel-extension-for-transformers

That's very interesting and very good news! Thank you for digging into this. This is with the HF backend? I usually ran vLLM since it is much faster. Maybe it...

I wonder whether the problem is not with vLLM rather than with lm_eval. I'll do some more tests.

Interesting, I didn't know this. But I don't think it matters, I would be surprised that TRL uses FSDP's reduce-scatter for single GPU training.

Sure, it's all in the notebook I linked to in my first post. I ran this notebook on Colab with the A100.

Yes: ![image](https://github.com/user-attachments/assets/d25e7b20-7551-47fe-b758-01f750636738) This configuration uses fp32 and adamw_torch.