liuwei-git
liuwei-git
Make phi3 as an explicit model to support in llama.
The only difference between phi3 4k and 128k model is from the rotary embedding. 128k model adds long/short rope scaling factors (freq_factors) and an attn factor to each hidden dimension....
### Issue to fix: The static OnnxLoader::loader object may be destroyed by the system before the customer's cleanup code (e.g., TRITONSERVER_ServerDelete) runs. This premature destruction causes an assertion failure inside...