shixianc issues

Results 5 issues of


                                            shixianc

TensorRT-LLM Triton Backend Support

When can NAV support creating Triton Repo for this new backend? Is it on your roadmap? https://github.com/triton-inference-server/tensorrtllm_backend

enhancement

non-stale

Questions related to TRT conversion and TRT-LLM support

I have 2 separate questions which I could not find an answer yet, so post it here hope someone can answer: 1. When doing TRT conversion from torchscript to trt....

enhancement

non-stale

[Feature Request] Mixtral Offloading

There's a new cache technique mentioned in the paper https://arxiv.org/abs/2312.17238. (github: https://github.com/dvmazur/mixtral-offloading) They introduced LRU cache to cache experts based on patterns they found, and also took speculative guess to...

triaged

feature request

Add AWS Inf2 instances support for aws_batch scheduler

Add AWS Inf2 instances support for aws_batch scheduler. There're usecases to use torchx to launch data parallel inference jobs on inf2 instances on AWS Batch. Configurations are referencing https://aws.amazon.com/ec2/instance-types/inf2/ Test...

CLA Signed

[Question] achieving deterministic atomic_add reduction order

Hey folks, is there a way to achieve deterministic atomic_add reduction in Triton? CUDA has something like "turnstile reduction" where CTA waits on previous K blocks to finish first thru...