hl0929

Results 3 comments of hl0929

> If you want a interactive or data-streaming way to do inference with a trained model, unfortunately there is no boilerplate code yet. A simple idea to implement it is...

When I load the llama model, some GPU will do this and others will be fine