Harshini Komali
Harshini Komali
Related PR: https://github.com/triton-inference-server/core/pull/338 Changes: Updated DetermineStatsModelVersion(), MergeStatistics() functions to handle cache hit scenario when ensemble top request is cached due to which composing models are not executed. Tests for DetermineStatsModelVersion()
ref slack thread: https://nvidia.slack.com/archives/CAZKCU4UV/p1677717244222069 Currently caching at the top-level request sent to ensemble scheduler is not supported. Implemented caching top level requests for ensemble models. In case of cache hit,...
Related PR: https://github.com/triton-inference-server/core/pull/338 Added 4 new tests in L0_response_cache to test top level request caching for ensemble models Test 1: When cache and decoupled enabled in ensemble model config: Error...
Top level response caching for ensemble models
This check is needed so that test_inference_profiler unit test doesn't fail.
Added version folder in ensemble_model in side the model repository. Changed shm-size from 256m to 1G. These changes are required to run the ensemble model example on Triton 23.12.