Yi-sir

Results 8 comments of Yi-sir

I' m building tei with python backend on a Ubuntu20.04 machine without nvidia device and meet a similar problem, ``` --- stderr thread 'main' panicked at /home/xyz/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/aws-lc-sys-0.28.0/builder/cc_builder.rs:492:13: ### COMPILER BUG...

牛的,manually。谢谢

再请教一下,框架里为什么要手动做切分和加载呢(而不是直接用from_pretrained)?我看到ByteMLPerf的modeling_llama.py里面mlp/attn之类的权重shape考虑了mp_size,但transformers库里没有这些操作。transformers也支持Multi-GPU Inference,加载模型的部分也用了一些加速库处理分布式的情况。 请问ByteMLPerf里切分和加载权重的方式相比transformers的方式有什么优势吗?

3. 支持了resnet50-torch-fp32 int8量化,需要交互设置量化参数。支持双芯异步perf

[75eb268](https://github.com/bytedance/ByteMLPerf/pull/119/commits/75eb268da25c4501b5d42cec94d32221666177db) 这笔增加了kv cache,但需要修改transformers里的一些代码才能运行。 ![image](https://github.com/user-attachments/assets/5aa2d2b2-b86f-433c-b8a7-90cc8e611780) ![image](https://github.com/user-attachments/assets/d542e794-4dcf-458d-8ff7-75b3ce4f5ea1)

But sglang pd disaggregation works well with the mooncake backend, I wonder what is the difference

> It seems that the server side doesn't correctly register the memory. Remove line 59 and have a try? sorry, do you mean these lines? if PROTOCOL == "rdma": ret_value...