choyakawa

Results 9 comments of choyakawa

I think you've missed the rotate_half part, while the tokenizer is the same as llama ``` def rotate_half(x): # Split and rotate x1 = x[..., ::2] x2 = x[..., 1::2]...

Not working with zero3: https://github.com/InternLM/xtuner/issues/432#issuecomment-2002443611

> > Not working with zero3: [#432 (comment)](https://github.com/InternLM/xtuner/issues/432#issuecomment-2002443611) > > qlora does not currently support zero3. It is not the issue with 4bit. I used full and no lora, however...

Failed on `llava_internlm2_chat_7b_clip_vit_large_p14_anyshape_e1_gpu8_pretrain` with deepspeed zero3, is there anything wrong? `NCCL_IB_TIMEOUT=120 XTUNER_DATASET_TIMEOUT=120 NCCL_DEBUG=INFO NPROC_PER_NODE=8 NNODES=4 PORT=12345 ADDR=server0 NODE_RANK=0 xtuner train llava_internlm2_chat_7b_clip_vit_large_p14_anyshape_e1_gpu8_pretrain --deepspeed deepspeed_zero3` ``` set_module_tensor_to_device(model, param_name, param_device, **set_module_kwargs) File "/home/user/.local/lib/python3.11/site-packages/transformers/modeling_utils.py",...

zero2 is ok, but replicating LLaVA 1.6 with 34B model is challenging without zero3

@LZHgrla Do you have any idea on the failure of zero3? I am having no idea why the image features from clip has shape torch.Size([0]) here. It seems that batchsize>1...

I am not using quantization, the above failure was on bf16. And I have also tried open_clip instead of openai vit-L, not working.

> Moreover, the model can be efficiently trained in academic settings, within 23 hours on 8 A100 GPUs (vs. 26 hours of LLaVA-1.5).

will the vision model of glm-4 also be considered?