Jason Ng issues

Results 7 issues of


                                            Jason Ng

OpenAI Agent Not Function Calling

### Checked other resources - [X] I added a very descriptive title to this issue. - [X] I searched the LangChain documentation with the integrated search. - [X] I used...

Experiments with Alphanumeric Entities

Hi there! Thank you for the wonderful work done as this greatly reduced the memory overhead and increased inference time for my use case. I noticed that the prompt compression...

question

Model 'tensorrt_llm' loading failed with error: key 'use_context_fmha_for_generation' not found

**Description** Unable to run triton inference server with tensorrt-llm for Llama3-ChatQA-1.5-8B **Triton Information** v2.46.0 Are you using the Triton container or did you build it yourself? Using Triton container image...

TensorRT backend gives no output

Hi, I have built a TensoRT engine and tried running the command: ``` python3 run_server.py -p 9090 -b tensorrt -trt {path_to_engine} ``` but the only output that I have received...

Stark Difference in GPU Usage of Triton Servers with Llama3 and Llama3.1 models

**Description** I have noticed that there was a huge difference in memory usage for runtime buffers and decoder for llama3 and llama3.1. **Triton Information** What version of Triton are you...

Stark Difference in GPU Usage of Triton Servers with Llama3 and Llama3.1 models

**Description** I have noticed that there was a huge difference in memory usage for runtime buffers and decoder for llama3 and llama3.1. **Triton Information** What version of Triton are you...

[Question] Latency Benchmarks between GPU & CPU

Hi, wonder if there are any benchmarks done to compare the retrieval latency between using GPU and CPU? It would be great to understand the tradeoff in using LEANN on...