llm icon indicating copy to clipboard operation
llm copied to clipboard

Mechanism for injecting "fake" responses from models into the conversation

Open simonw opened this issue 2 years ago • 1 comments

https://twitter.com/simonw/status/1694089359514104094

A useful trick is sometimes to feed a model a prior conversation that includes things that the model didn't actually say - things like "Sure, I'd be happy to help you with that" for minor jailbreaks.

Not sure what the CLI options for this should look like, or how they should be recorded in the SQLite database logs.

simonw avatar Aug 22 '23 20:08 simonw

  • --answer-prefix 'Sure, I'd be happy to help you with that' This is good for jailbreaks. I don't believe it's necessary to save this prefix in a database; it would be simpler to include it at the beginning of the answer, giving the impression that the model started its answer with this prefix.

  • Editing a conversation history is also needed sometimes. My best bet here is to output markdown or org-mode, open an editor that edits this markdown and then pipes it back to llm to be parsed and converted to a proper JSON chat history.

NightMachinery avatar Nov 29 '23 04:11 NightMachinery

Check out my project https://github.com/relston/mark, which is designed in a way that is perfect for this use case. I'm now leveraging LLM as a dependency.

relston avatar Feb 19 '25 05:02 relston