hezeli123 comments

Results 10 comments of


                                            hezeli123

GPTNeox decoding argumentation

The logic of repetition_penalty in FT is not same with OPENAI description, How to use it ? OpenAI: https://platform.openai.com/docs/guides/gpt/managing-tokens mu[j] -> mu[j] - c[j] * alpha_frequency - float(c[j] > 0)...

Qwen-VL-Chat vit embedding diff

> Hi @calico-niko @bnuzhanyu The ViT is offloaded to TRT, and the fp32 accuracy of it on TRT9.3 is alined with Pytorch. And you can also change the version of...

Qwen-VL-Chat vit embedding diff

The current ViT diffs have a big impact which results in many bad cases. I run ViT with FP32 precision now.

server fails in Stuck when using pipeline parallel in multi-nodes

> Hi @hezeli123 , you said that when not using pipeline parellism this works for you. I assume you just omitted `--pp_size` or set it to `1` when you built...

Triton Server for Mixtral fails non-deterministically with a boost exception error

> Could you share the content of your `/tensorrtllm_backend/all_models/bls/` folder? this issue maybe the same problem[https://github.com/triton-inference-server/tensorrtllm_backend/issues/354]. pre/post processiong model files were : https://github.com/triton-inference-server/tensorrtllm_backend/tree/main/all_models/inflight_batcher_llm

[Bug] Mini-InternVL-Chat-2B-V1-5 AWQ量化后推理速度比量化前慢

并发数 | norm tokens/s | awq tokens/s -- | -- | -- 1 | 40.93 | 42.06 2 | 62 | 60.52 4 | 79.08 | 73.32 8 | 94.4...

[Bug] Mini-InternVL-Chat-2B-V1-5 AWQ量化后推理速度比量化前慢

> Could you share the benchmark scripts? 脚本很简单，主要方式是一批外网的url（如：http://img1.baidu.com/it/u=3682444617,1983875605&fm=253&app=138&f=JPEG?w=1067&h=800），采样同步轮询调用openai接口，统计的瓶颈吞吐。 @lvhan028 你也可以使用你们内部的方式看下性能情况，感觉针对规模较小的模型，量化加速效果不好。

[Bug] 访问一段时间后服务卡死/无响应

期间没有发送stop的请求

[Bug] 访问一段时间后服务卡死/无响应

日志如下，有收到请求后的图像下载信息，后续没有LLM推理相关的日志。 2024-07-11 19:59:48,123 - lmdeploy - [37mINFO[0m - async_collect_pil_images latency: 98.4154 ms 2024-07-11 19:59:48,123 - lmdeploy - [37mINFO[0m - ImageEncoder received 1 images, left 1 images. 2024-07-11 19:59:48,123 - lmdeploy...

[Bug] 针对使用peft微调后的unet编译后生图错误

使用oneflow_compile编译后，生成的图片全黑，使用torch.compile(unet)没有问题，生成的图片正常。