Errors of llava pretrain for phi3_mini_4k_instruct_clip_vit_large_p14_336
I strictly follow the document for phi3_mini_4k_instruct_clip_vit_large_p14_336. Run comand NPROC_PER_NODE=4 xtuner train llava_phi3_mini_4k_instruct_clip_vit_large_p14_336_e1_gpu8_pretrain --deepspeed deepspeed_zero2 --seed 1024
conda environment python==3.10 transformers==4.41.1 torch==2.3.0 CUDA 12.1 4x3090
05/23 08:10:20 - mmengine - INFO - before_train in EvaluateChatHook. You are using an old version of the checkpointing format that is deprecated (We will also silently ignore
gradient_checkpointing_kwargsin case you passed it).Please update to the new format on your modeling file. To use the new format, you need to completely remove the definition of the method_set_gradient_checkpointingin your model. [rank3]: Traceback (most recent call last): [rank3]: File "/media/ljm/anaconda3/envs/xtuner/lib/python3.10/site-packages/mmengine/runner/_flexible_runner.py", line 1271, in call_hook [rank3]: getattr(hook, fn_name)(self, **kwargs) [rank3]: File "/home/xtuner/xtuner/engine/hooks/evaluate_chat_hook.py", line 230, in before_train [rank3]: self._generate_samples(runner, max_new_tokens=50) [rank3]: File "/home/xtuner/xtuner/engine/hooks/evaluate_chat_hook.py", line 216, in _generate_samples [rank3]: self._eval_images(runner, model, device, max_new_tokens, [rank3]: File "/home/xtuner/xtuner/engine/hooks/evaluate_chat_hook.py", line 148, in _eval_images [rank3]: generation_output = model.generate( [rank3]: File "/media/ljm/anaconda3/envs/xtuner/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context [rank3]: return func(*args, **kwargs) [rank3]: File "/media/ljm/anaconda3/envs/xtuner/lib/python3.10/site-packages/transformers/generation/utils.py", line 1758, in generate [rank3]: result = self._sample( [rank3]: File "/media/ljm/anaconda3/envs/xtuner/lib/python3.10/site-packages/transformers/generation/utils.py", line 2390, in _sample [rank3]: model_kwargs = self._get_initial_cache_position(input_ids, model_kwargs) [rank3]: File "/media/ljm/anaconda3/envs/xtuner/lib/python3.10/site-packages/transformers/generation/utils.py", line 1321, in _get_initial_cache_position [rank3]: past_length = model_kwargs["past_key_values"][0][0].shape[2] [rank3]: TypeError: 'NoneType' object is not subscriptable
downgrade transformers can solve this
downgrade transformers can solve this @acdart hi , now my transformer is 4.41.1, Which version of Transformer should it be downgraded to?
How were you even able to run that command? It tells me that it doesn't recognize NPROC_PER_NODE=4, and if I run it without that bit (i.e. just running xtuner train llava_phi3_mini_4k_instruct_clip_vit_large_p14_336_e1_gpu8_pretrain --deepspeed deepspeed_zero2 --seed 1024), it says it doesn't know what llava_phi3_mini_4k_instruct_clip_vit_large_p14_336_e1_gpu8_pretrain is.