高凌霄 issues

Results 4 issues of


                                            高凌霄

请问chatglm6b，glm10b和glm130b模型到底有哪里不同的

### Is there an existing issue for this? - [X] I have searched the existing issues ### Current Behavior 通过查看glm相关论文，我总结出了glm和glm130b的区别： | 模型名 | PE | 归一化 | | ------------- |-------------|...

when fastertransformer support continuous batching and PagedAttention ?

From this [article,](https://www.anyscale.com/blog/continuous-batching-llm-inference) I learned that continuous batching and PagedAttention greatly improve the inference performance of large models. I would like to know if fastertransformer has plans to support these...

When hot-loading a large model, a segmentation fault will occur.

### Description ```shell I start triton server with '--model-control-mode poll'. Segmentation fault occurs when modifying the model directory. ``` ### Reproduced Steps ```shell 1.CUDA_VISIBLE_DEVICES=3,4,5,6 /opt/tritonserver/bin/tritonserver --model-repository=/ft_workspace/all_models/t5/ --http-port 8008 --model-control-mode poll...

bug

fix: mrope cause shpae is not match

问题复现方式：在量化qwen2.5 vl模型时，设置n_parallel_calib_samples参数，会导致transformers在计算rope是，出现shape不匹配的问题原因是qwen2.5 vl模型使用了mrope，也就是3维rope算法，它传入的position_embedding参数包含三种不同的频率，导致postion_embedding的shape出现变化，原始的方法不能兼容这种变化备注：这个样例只是做了简单实现，需要maintainer做更好的集成 Issue Reproduction Steps: When quantizing the Qwen2.5-VL model, setting the n_parallel_calib_samples parameter causes a shape mismatch error in the transformers library during RoPE...