landandan comments

Repositories
Issues
Comments

Results 3 comments of


                                            landandan

[enhancement] support llama

@void-main Hello，i found a bug that after multiple (thousands of) batch（20） inference, some batches may output randomly. But if the triton service is restarted, it can be inferred normally. When...

Another problem is When batch inference is used, the results generated by the same prompt are different. paramters: top_k=1, random_seed=1, output_len=500 device: T4/ A100 4-gpu by triton sever ``` prompt...

CUDA memory is not released after inference

same problem