Does paddler have something like a filter that can modify the prompt en route.
Say I want to add something to the prompt/query as it is transitioning thru the paddler? Can that be achieved?
@skanga Hi
Thanks for being interested in Paddler! That could be an useful feature. Do you have some idea on changing the prompt for example? We can nicely talk about it if you want.
My idea is as follows. I would like to have a template file that sits on the paddler node. When my client makes a call with a prompt, there would be a template variable called something like %prompt% (or whatever) and the incoming prompt would be added to the template and sent to llama.cpp
First of all, thanks for sharing you idea about the prompt. Pingora works with a filter on proxy-level, so sure, we can insert some stuff before forwarding the request, like coming from a file:
Answer the question: %prompt%
prompt:
What's 1 + 1?
Apply against the prompt:
Answer the question: What's 1 + 1?
The question is if this is necessary, prompt is inherent to llama,cpp's inference default template, its applied to all requests like the user prompt. If you want this prompt template to be applied conditionally only for some requests then we would need some mechanics for that. I tried to search the advantage of using prompt template, can you tell me how would that add some advantage over just hardcoding? I hope i helped you.
I have many clients. They are all sending some prompts. Sometimes I need to update the prompt, but do not want to update the client. So I can do centralized prompt management via this template.
@skanga Yeah, we will be adding stateful features like this one once we release 2.0 (probably) next week (preview is on supervisor2 branch).
Also as a bonus, we are compiling in llama.cpp into Paddler (so it no longer necessary to deploy both).
If you want to try it out please let me know; you can also drop me an email, I will be happy to help, because we are looking for production use-cases. :)
@skanga In version 2.0 it does, you can use your own custom chat templates: https://paddler.intentee.com/docs/best-practices/how-to-control-response-quality/