Amrita Panesar
Amrita Panesar
Having the same issue
Thanks for sending this through - apologies I didn't explain this very well. What I actually was asking about is if there is a a way to reduce loading time...
Support for DoLa would be great!
Is there a way to pass in custom decoding config in offline inference mode for different prompts i.e. use outlines to generate custom json output per prompt? It seems that...