Jansen

Results 7 comments of Jansen

Same. Due to the nature of DockerInDocker(dind), it is not necessary to dwell on solving this problem; here is a way to avoid similar problems: 1. Build an actions-runner/_work directory...

使用 resume_from_checkpoint 有个问题,这个库的 ModifiedTrainer 改了 torch.save 的文件名为 adapter_model.bin,但是 resume_from_checkpoint 需要 pytorch_model.bin 导致出现: ``` Traceback (most recent call last): File "/workspaces/src/ai/luotuo-qa-tuning/train.py", line 120, in fire.Fire(main) File "/usr/local/lib/python3.10/site-packages/fire/core.py", line 141, in Fire...

> @liasece > Memory error indicates that the process has consumed all of the ram memory. You may want to reduce the batch size/image size and try again. This is...

@Saduf2019 This problem does not seem to be reproducible in your colab environment, so perhaps you need Windows and a GPU, properly tuned for frozen_batch_size, to reproduce the problem.

> > This is what is not normal, I have enough memory in my system > > When it says "Unable to allocate 184. MiB", it means that it was...

Same. Using the afterAllFileWrite hook does solve the problem, but it is very unreliable, especially if `sh` does not exist on the target platform. We need a solution like https://github.com/dotansimha/graphql-code-generator/pull/7322...