郝晶
郝晶
-u 128 ?? Should be -u 7
> Please remove this thread, it can be consider as a SPAM. > @zhangxxyz are you suppose everyone can read your Chinese? > 请删了这垃圾信息 @justwjx Are u sure that you...
> > 请问: 在单机多卡环境 跑这段命令 报错 该怎么修改启动命令呢? RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! (when checking argument for...
> I would like to ask how the first step is to use ChatGPT? How to turn the input text into a task? 哥们儿中国人吧,你说的啥意思?
> 还可以这样:`CUDA_VISIBLE_DEVICES=0,1 python pretraining.py` 使用device_map="auto"可以自动分配多个卡加载模型。 针对glm6b2我试过了,还是不行。报同样的错误。但是glm6b就能用。
> Hi, will vllm support 8bit quantization? Like https://github.com/TimDettmers/bitsandbytes > > In HF, we can run a 13B LLM on a 24G GPU with `load_in_8bit=True`. > > Although PageAttention can...
CREATE MODEL ollama_model PREDICT completion USING engine = 'ollama_engine', model_name = 'gemma2:27b', ollama_serve_url = 'http://172.16.8.231:11434'; > > @boxter007 Can you please post your `CREATE MODEL` statement here? > > @MinuraPunchihewa...