tricky
tricky
Hi, I have changed the model config like the size of gru_a and gru_b after the model train finished, I replaced the nnet_data.h and nnet_data.c, when rebuild , the md5sum...
the stable v2.2.0 has no lightseq.training.ops.pytorch.quantization, so just build from the newest master code, right? I use the cuda-11.5 and build from source. since cmake error message "missing CUDA::cublas_static", I...
Hi, thanks for your great work, I have a question about the QuantTanh, maybe it's a pytorch problem when I use the QuantTanh, fater I saved the model, I cannot...
If I have model-repository1 and model-repository2 which share the same structure but different parameters. Does triton support the requests to model-repository1 and model-repository2 assemble into a batch?
I find no example of using AsyncStreamInfer and StopStream When I need to send multi request, each request will call StartStream and AsyncStreamInfer. but when called the second request, it...
I have convert my onnx model to tensorrt, however the result is quit strange. My model is trained in mix precision,when I add the following line, will convert to fp16...
https://github.com/FunAudioLLM/CosyVoice/blob/2d6bb9bd80aad0f46c4324dd0b5ee23a30b090a3/runtime/triton_trtllm/model_repo/token2wav_dit/1/token2wav_dit.py#L421