andyluo7

Results 12 comments of andyluo7

I am facing the same issue.

@nvpohanh is there anyway to downgrade to TRT6 from TRT7 with cuda 10.2 on Ubuntu 18.04? Is there any inference_result_v0.5 docker container which can run on Xavier?

@nvpohanh Pls let me know if I can install TRT6 seperately

@nvpohanh , no, i did not run it in the container. I try to run the inference on AGX Xavier. Should I run it in container?

@nvpohanh, I used the latest Jetpack 4.4 DP. Is there any difference? There are multiple cuda.h files in the system. I wonder if I should change the Makefile to add...

I have to hard coded #include in the .h and .c files in /inference_results_v0.5/closed/NVIDIA/code/harness/lwis to make it work.

@Jeffwan , i got the same issue. How to fix it? aluo@tw020:~/aibrix$ kubectl apply -f https://github.com/vllm-project/aibrix/releases/download/v0.2.0/aibrix-core-v0.2.0.yaml namespace/aibrix-system created customresourcedefinition.apiextensions.k8s.io/kvcaches.orchestration.aibrix.ai created customresourcedefinition.apiextensions.k8s.io/modeladapters.model.aibrix.ai created customresourcedefinition.apiextensions.k8s.io/podautoscalers.autoscaling.aibrix.ai created customresourcedefinition.apiextensions.k8s.io/rayclusterfleets.orchestration.aibrix.ai created customresourcedefinition.apiextensions.k8s.io/rayclusterreplicasets.orchestration.aibrix.ai created customresourcedefinition.apiextensions.k8s.io/rayclusters.ray.io created...

[AMD Official Use Only - AMD Internal Distribution Only] Nice. Will try. Get Outlook for iOS ________________________________ From: Maria Camila Ruiz Cardenas ***@***.***> Sent: Wednesday, February 26, 2025 12:01:25 PM...

@alexsin368 , i have the same version of transformers 4.37.0 in the docker.

@alexsin368 , i want to get the time to the first token so I used --token-latency.