Guardrail mechanism for `UserContent`

Open johnsonr opened this issue 3 months ago • 2 comments

UserContent represents text coming into the system from users. This may be malicious or toxic. We should have the ability to apply consistent guardrails here.

Oct 25 '25 16:10 johnsonr

Hi! I would like to work on this if it's okay. :) I think the best approach would be to create a chain of pluggable guardrails, starting from simple static ones to more complex model-driven guardrails, that can be applied to the user inputs.

Also, I think this would introduce some latency, especially in LLM guardrails if we decide to incorporate. Would suggest a caching mechanism as well.

Nov 18 '25 12:11 harinda05