Yu comments

Results 22 comments of

Yu

你好，内存问题

> > > Hi! We strongly recommend you to run our code in a GPU server with a larger CPU memory (at least 32g). If not, you can split the...

你好，内存问题

但是我找了一个32G电脑（没有切割数据集），依然是类似的报错，是只能在linux下运行吗？

你好，模型加载问题

> > > 请问模型加载代码是你自己写的还是在跑这个源码时报的错？ > 说下你的环境跑源码时，windows tf1.7

location problem

So why you submit this issue in English when you can understand the README write by Chinese? Because of the place you asked,pleace reply me 'I support 'one-China policy ''...

[BUG] Deepspeed Zero 3 Inference InFlight Params with new HuggingFace Mixtral Model

> > https://github.com/ftgreat/llmkit/blob/main/huggingface/mixtral/mixtral_dense_moe_monkey_patch.py > > Add some unittest cases. https://github.com/ftgreat/llmkit/blob/main/huggingface/mixtral/mixtral_dense_moe_monkey_patch.py#L69 I tried the code and run without error, but the loss is all 0. and grad is 1? ``` {'TFlops':...

[Bug] The program frequently stops running during execution

tks, but what are the results of "Initially, all 8 GPUs are operational, but as time progresses, they gradually cease to function."? This hinders the automated execution of the program,...

[Bug] The program frequently stops running during execution

Yes, I understand that the GPU goes into a dormant state when there are no tasks. However, as mentioned earlier, the tasks are far from complete. In fact, there are...

[Bug] The program frequently stops running during execution

1.It doesn't seem to be a monitoring issue because using the command line tool `nvidia-smi` also reveals that the GPU is not functioning properly. 2.Volcano Engine containers are perhaps similar...

[Bug] The program frequently stops running during execution

It seems that there is an issue with task initiation. For example, this task output 'launch OpenICLInfer' in the command line, but it appears that no new task has been...

[Bug] The program frequently stops running during execution

Similar to the situation below, a task started at 10 o'clock, and from 10 to 11 o'clock, 4 GPUs had a utilization rate of 0, while the remaining 4 GPUs...