vip
vip
the same error IndexError: Invalid key: 22330 is out of bounds for size 0
> 或者你能试试吗?这是我们可以尝试的替代方案,因为我同意我相信只有当我们没有共享文件系统时才存在问题。`pip install git+https://github.com/huggingface/transformers@muellerzr-multinode-save` After updating the code, deepspeed starts the cluster and saves the checkpoint named tmp checkpoint-10 from the node. The host point is checkpoint-10. After saving the...
> 显示已解决,并带有正确的标志。 Has the problem been resolved?
To supplement,:there were no errors when using zero2, but there were new errors after training. Does it not support Mixtra? 
RuntimeError: PytorchStreamReader failed reading file data/0: invalid header or archive is corrupted
specifying save_safetensors produces a Pytorch_model.bin
RuntimeError: PytorchStreamReader failed reading file data/0: invalid header or archive is corrupted
Now I am continuing SFT training from checkpoint and reporting this error again I have configured this parameter: use_reentrant: true resume_from_checkpoint: /workspace/axolotl-main/checkpoint-5865  
Does Mixtra support AWQ 4-bit?
> Do you encounter same issue on LLaMA 2-70B? The current test is llama3, and llama-2-70B has not been tested before. Is this related to int4/awq, FP16 is normal
> 我有同样的问题,但我仍然不知道如何解决它  Has the latest code been updated? Updating the latest code should solve the problem. If it has been updated, please check if the GRAPHRAG index is...
> > > > > > 我这几天测试一下,看看会不会有欠拟合的问题,可能moe模型的稳定性比较强 > > > > > > > > > > > > > > > 你deepspeed降级后,不会出现ImportError: cannot import name 'log' from 'torch.distributed.elastic.agent.server.api'吗?这个问题是只有14.4才能支持的 >...