Yi-sir
Yi-sir
I' m building tei with python backend on a Ubuntu20.04 machine without nvidia device and meet a similar problem, ``` --- stderr thread 'main' panicked at /home/xyz/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/aws-lc-sys-0.28.0/builder/cc_builder.rs:492:13: ### COMPILER BUG...
牛的,manually。谢谢
再请教一下,框架里为什么要手动做切分和加载呢(而不是直接用from_pretrained)?我看到ByteMLPerf的modeling_llama.py里面mlp/attn之类的权重shape考虑了mp_size,但transformers库里没有这些操作。transformers也支持Multi-GPU Inference,加载模型的部分也用了一些加速库处理分布式的情况。 请问ByteMLPerf里切分和加载权重的方式相比transformers的方式有什么优势吗?
3. 支持了resnet50-torch-fp32 int8量化,需要交互设置量化参数。支持双芯异步perf
[75eb268](https://github.com/bytedance/ByteMLPerf/pull/119/commits/75eb268da25c4501b5d42cec94d32221666177db) 这笔增加了kv cache,但需要修改transformers里的一些代码才能运行。  
Env: lmsysorg/sglang:v0.5.3-cu129
But sglang pd disaggregation works well with the mooncake backend, I wonder what is the difference
> It seems that the server side doesn't correctly register the memory. Remove line 59 and have a try? sorry, do you mean these lines? if PROTOCOL == "rdma": ret_value...