shixianc
shixianc
When can NAV support creating Triton Repo for this new backend? Is it on your roadmap? https://github.com/triton-inference-server/tensorrtllm_backend
I have 2 separate questions which I could not find an answer yet, so post it here hope someone can answer: 1. When doing TRT conversion from torchscript to trt....
There's a new cache technique mentioned in the paper https://arxiv.org/abs/2312.17238. (github: https://github.com/dvmazur/mixtral-offloading) They introduced LRU cache to cache experts based on patterns they found, and also took speculative guess to...
Add AWS Inf2 instances support for aws_batch scheduler. There're usecases to use torchx to launch data parallel inference jobs on inf2 instances on AWS Batch. Configurations are referencing https://aws.amazon.com/ec2/instance-types/inf2/ Test...
Hey folks, is there a way to achieve deterministic atomic_add reduction in Triton? CUDA has something like "turnstile reduction" where CTA waits on previous K blocks to finish first thru...