Neelay Shah
Neelay Shah
To support Big Endian architectures and align python backend handling of BYTES tensors with other backends, this change removes the explicit little endian flag to serialize and deserialize the length...
Related to: https://github.com/triton-inference-server/server/issues/7066 Looking for feedback on the goal for the API - whether models loaded in this way should be included in the model index - or if wanted...
Based on this issue https://github.com/triton-inference-server/server/issues/6359, and the debugging by: [iyLester](https://github.com/iyLester) - a quick change to the logic to identify a duplicate unload request. Previously the logic would identify every request...
1) ignore parameters that are None (convenience for openapi parameters with None as default settings) 2) add support for double 3) add support for mapping numpy.bool_ type to tritonserver data...
Adding convenience utilities for json logging with new log formats
This PR does two main things: 1) Add support for triton's generate endpoint. This reuses the PA implementation for the OpenAI HTTP client - as it supports text in /...