tingjun-cs
tingjun-cs
When try to use trtllm-serve to deploy a deepseek v3 serving program, like this: ```shell trtllm-serve --host 0.0.0.0 --port 8100 --max_batch_size 32 --max_num_tokens 4096 --max_seq_len 4096 --tp_size 8 --trust_remote_code /models/deepseek-ai/deepseek-v3/...
### 🚀 The feature, motivation and pitch I saw from the official documentation (https://docs.vllm.ai/en/latest/features/tool_calling.html) that sglang supports tool calls, but I can't seem to find the tool parse for deepseekv3/r1....
### Issues Policy acknowledgement - [x] I have read and agree to submit bug reports in accordance with the [issues policy](https://www.github.com/mlflow/mlflow/blob/master/ISSUE_POLICY.md) ### Where did you encounter this bug? Local machine...
### Willingness to contribute Yes. I can contribute this feature independently. ### Proposal Summary ### Proposal Summary I tried to register a prompt using the following code, but encountered an...
### Proposal Summary I tried to register a prompt using the following code, but encountered an error: `INVALID_PARAMETER_VALUE: Prompt text exceeds max length of 5000 characters.` It appears MLflow has...
**Description** I've observed a progressive increase in CPU memory usage during the fine-tuning process. Specifically, after each model checkpoint is saved, a portion of the CPU RAM is not released....
How to automatically run `ray serve start` with custom host/port after RayCluster head pod starts?
Hi team, I'd like to configure Ray Serve in a KubeRay cluster to listen on `0.0.0.0:8100` instead of the default `127.0.0.1:8000`. The desired command is: ```bash serve start --http-host 0.0.0.0...