server I got the same problem on 21.11 and 21.12, it works with the single model or a couple of models, but triton never releases them.

          I got the same problem on 21.11 and 21.12, it works with the single model or a couple of models, but triton never releases them.

Ensemble model: Python backend(cpu) + onnx model(GPU)

python model: instance_group [ { kind: KIND_CPU } ]

model_warmup [{}]

response_cache { enable: True }

onnx model: instance_group [ { kind: KIND_GPU } ]

model_warmup [{}]

response_cache { enable: True

Originally posted by @alicimertcan in https://github.com/triton-inference-server/server/issues/3761#issuecomment-1018443038

Mar 18 '24 09:03 wangzz313

cc @GuanLuo @rmccorm4 @jbkyang-nvi

Mar 18 '24 21:03 lkomali

Can you provide more information. Is this latest version of triton? If not can you try with the latest version 24.02

Mar 18 '24 21:03 indrajit96

Hi @wangzz313, as @indrajit96 suggested, have you tried the newer version of triton? 21.11 and 21.12 are quite old. Unfortunately, 24.02 version does not come with onnxruntime backend, so please try 24.01

Mar 20 '24 17:03 oandreeva-nv

@oandreeva-nv we're facing the same issue with 24.01 and 24.08 (cc: @susnato)

Sep 11 '24 15:09 rishabhmehrotra