Muhammad Yasir
Muhammad Yasir
After following your instructions from the above comment on colab, it gives this error after first epoch: Epoch 1/70 74/74 [==============================] - 273s 4s/step - loss: 0.4780 - dice_coef: 0.5583...
I have applied the patch on 2.9.15 and it only failed at one chunk in snort.c file. That chunk has now been moved into reload.c file. Update it manually by...
When running API server , try `python3 -m fastchat.serve.api --host 0.0.0.0 --port 8000` instead of `localhost`
Yeah, I read that. And ran train.py but then got CUDA out-of-memory error. Is there any other way? I tried lower per_device_batch_size of 1 for evaluation too. Still, out-of-memory issue.
OutOfMemoryError: CUDA out of memory. Tried to allocate 50.00 MiB (GPU 0; 11.17 GiB total capacity; 1.87 GiB already allocated; 43.81 MiB free; 1.97 GiB reserved in total by PyTorch)...
My command is: `torchrun --nproc_per_node=16 --master_port=20001 fastchat/train/train.py --model_name_or_path Vicuna_Weights --data_path dummy_data.json --fp16 True --output_dir Output_Weights --num_train_epochs 3 --per_device_train_batch_size 1 --per_device_eval_batch_size 1 --gradient_accumulation_steps 16 --evaluation_strategy "no" --save_strategy "steps" --save_steps 1200 --save_total_limit...
I have changed my instance that supports A100 and it is working now. Thank You,
I just checked. Vulkan drivers are not installed. Let me install and then come back.
@cryscan I have A10 G Nvidia card and driver version is 515.65.01 with CUDA 11.7. I want to know how can I install vulkan drivers? I don't seem to find...
Python 3.10.5