Nikhil Kulkarni
Nikhil Kulkarni
Hi @joostwestra, thank you for creating this issue. Could you please share your config.pbtxt file for context?
@david-waterworth Right, to load the BLS model explicitly, along with other models that it refers to, they'd have to be loaded at once using the `--load-model` argument. The above hack...
@david-waterworth I will look into this, perhaps some ordering issue with the additional_args vs. log_info workaround. Please continue using the workaround. I will update the thread once I have more...
@jadhosn Could you share more around the objective you are trying to acheive. And also the exact failure you are seeing? Note that in MME mode, SageMaker will handle model...
cc : @mufaddal-rohawala for review
All changes for SageMaker are upstreamed to Triton's Github Repo - https://github.com/triton-inference-server/server, and so - The SM-Triton image is essentially the same image as NGC container, with the following backends...
Closing this due to the issue being addressed. For an overview of the Triton Image build, please refer the above comment - https://github.com/aws/deep-learning-containers/issues/1557#issuecomment-1551088683
Hi @geraldstanje we don't support the TRT-LLM container for Triton on SM yet. Most changes to support SageMaker are already upstreamed and the above container should work with SageMaker directly....
@geraldstanje Based on your initial comment, you want to run TRT-LLM on SageMaker, is that correct? I'm trying to say that the nvidia TRT-LLM image will work just fine on...
@chen3933 It does not seem common to implement a predict_fn, input_fn, output_fn that handles only len(data)==1, but if customer has implemented to process only 1 request e.g., any assert check...