deep-learning-containers icon indicating copy to clipboard operation
deep-learning-containers copied to clipboard

[question] triton inference server Dockerfile

Open geraldstanje opened this issue 1 year ago • 5 comments

hi,

where can i find documentation how to build triton inference server trt-llm 24.06 for sagemaker myself so i can run it on sagemaker?

Nvidia Image i want to use: nvcr.io/nvidia/tritonserver:24.06-trtllm-python-py3

Can you please post the Dockerfile you use to modify the nvidia container? I need the Dockerfile you have for creating: https://github.com/aws/deep-learning-containers/blob/master/available_images.md#nvidia-triton-inference-containers-sm-support-only - will than modify it and use the nvidia image above.

Info: i have created a trt llm model created myself - will copy it in the container.

cc @ohadkatz @nskool @sirutBuasai

geraldstanje avatar Jul 10 '24 03:07 geraldstanje

Hi @geraldstanje we don't support the TRT-LLM container for Triton on SM yet. Most changes to support SageMaker are already upstreamed and the above container should work with SageMaker directly. Functionally, you can just run the nvidia image on SageMaker, and shouldn't need the dockerfile to modify it.

nikhil-sk avatar Jul 10 '24 23:07 nikhil-sk

@nskool can you please paste the Dockerfile of the latest trt container for triton you released (24.3) here? I will modify the Docker and add llm myself.

geraldstanje avatar Jul 10 '24 23:07 geraldstanje

@geraldstanje Based on your initial comment, you want to run TRT-LLM on SageMaker, is that correct? I'm trying to say that the nvidia TRT-LLM image will work just fine on SageMaker. Is there a specific reason you are looking for a SageMaker-provided Triton image (that doesn't supply TRTLLM), are building TRT-LLM yourself atop it?

nikhil-sk avatar Jul 11 '24 17:07 nikhil-sk

@nskool it works - what i was looking for was the entrypoint: https://github.com/triton-inference-server/server/blob/main/docker/sagemaker/serve i just used image: nvcr.io/nvidia/tritonserver:24.06-trtllm-python-py3 - i didnt build it myself...

also it seems the metrics are not forwarded to sagemaker (entrypoint only has port 8080) - is there a solution for that?

does that mean you can run any docker container on sagemaker? can also run vllm/vllm-openai:latest docker container on sagemaker? https://docs.vllm.ai/en/latest/serving/deploying_with_docker.html

geraldstanje avatar Jul 23 '24 14:07 geraldstanje

@nikhil-sk did you see ^^ by chance?

geraldstanje avatar Jul 29 '24 09:07 geraldstanje

hi @nikhil-sk can we please update the tritonserver containers - who to ping? tritonserver is already at version 25.02.

@sirutBuasai or @ohadkatz can you help?

geraldstanje avatar Mar 24 '25 17:03 geraldstanje

pinging triton model server team oncall, @maaquib. Feel free to triage as needed.

sirutBuasai avatar Mar 26 '25 04:03 sirutBuasai

@geraldstanje We're waiting on a version with https://github.com/triton-inference-server/server/pull/7993 to be released. We'll publish a new container once that is available. In the meantime if this is urgent, you can consider using https://docs.djl.ai/master/docs/serving/serving/docs/lmi/user_guides/trt_llm_user_guide.html

maaquib avatar Mar 26 '25 16:03 maaquib

This issue has been automatically marked as stale due to 60 days of inactivity. Please comment or remove the stale label to keep it open. It will be closed in 7 days if no further activity occurs.

github-actions[bot] avatar Nov 02 '25 19:11 github-actions[bot]

Closing this issue after 7 additional days of inactivity since being marked stale.

github-actions[bot] avatar Nov 12 '25 01:11 github-actions[bot]