LLava train stops in middle

Open ThugJudy opened this issue 1 year ago • 1 comments

Hi I was trying to finetune LLaVA using the official dataset

I used the following command for finetuning:

 xtuner train llava_v15_13b_finetune_lora --deepspeed deepspeed_zero2

Unfortunately, the process abruptly and when I check if the pth has been generated, it wasn't.

I have attached the output below.

Map (num_proc=32): 100%|██████████████████████████████████████████████████████████| 118/118 [00:00<00:00, 218.82 examples/s]
Map (num_proc=32): 100%|██████████████████████████████████████████████████████████| 118/118 [00:00<00:00, 281.95 examples/s]
Filter (num_proc=32): 100%|███████████████████████████████████████████████████████| 118/118 [00:00<00:00, 385.54 examples/s]
Map (num_proc=32): 100%|███████████████████████████████████████████████████████████| 118/118 [00:03<00:00, 29.54 examples/s]
Filter (num_proc=32): 100%|███████████████████████████████████████████████████████| 118/118 [00:00<00:00, 341.35 examples/s]
Map (num_proc=32): 100%|███████████████████████████████████████████████████████████| 118/118 [00:04<00:00, 26.89 examples/s]
07/13 23:19:09 - mmengine - WARNING - Dataset LLaVADataset has no metainfo. ``dataset_meta`` in visualizer will be None.
Downloading shards: 100%|█████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 85.75it/s]
Loading checkpoint shards:  67%|█████████████████████████████████████████▎                    | 2/3 [00:38<00:20, 20.14s/it](xtuner-env) [psg4@gpua046 LLaVA]$ ls
app.py    data  id_ed25519.pub  LICENSE  llava.egg-info       mpi_hello.c  predict.py      README.md  work_dirs
cog.yaml  docs  images          llava    LLaVA-Instruct-150K  playground   pyproject.toml  scripts

Kindly let me how to fix this issue

Jul 14 '24 04:07 ThugJudy

增加虚拟内存。

Jul 15 '24 00:07 chalesguo