Amit Timalsina comments

Results 7 comments of


                                            Amit Timalsina

Has anyone implemented retrieval model in ways other than 2 tower model?

> hey @amit-timalsina, the picture above describes the serving case for a two-tower retrieval model. How, can you please elaborate?

triton with vllm chat completions

You have to use openai compatible triton front-end. But note that it is in beta. https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/client_guide/openai_readme.html I have already done this so let me know if you need further help.

triton with vllm chat completions

This should help. https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/tutorials/Quick_Deploy/vLLM/README.html Triton is a wrapper of multiple inference engines like vllm, tensortllm, python, etc. You have to setup model repository provide it to triton and you can...

triton with vllm chat completions

Triton server is running, right? Then it should work.

`tritonserver:25.08` with `vllm==0.10.1.1` and `VLLM_USE_V1=1` While setting up metric, failed to initialize Python stub, AsyncLLM has no attribute 'engine'

@dinhanhx Did you find a fix? I need to use vllm > 0.10.0 because i need to use transformers>=4.54.0 as it has support for gpt oss and doesn't have the...

`tritonserver:25.08` with `vllm==0.10.1.1` and `VLLM_USE_V1=1` While setting up metric, failed to initialize Python stub, AsyncLLM has no attribute 'engine'

``` class TritonPythonModel: def initialize(self, args): self.args = args self.logger = pb_utils.Logger self.model_config = json.loads(args["model_config"]) output_config = pb_utils.get_output_config_by_name( self.model_config, "text_output" ) self.output_dtype = pb_utils.triton_string_to_numpy(output_config["data_type"]) # Setup vLLM engine health check...

Nepal is not Listed in the Country List

Same here.