R-2-Guard
R-2-Guard copied to clipboard
Code implementation of R^2-Guard: Robust Reasoning Enabled LLM Guardrail via Knowledge-Enhanced Logical Reasoning
Evaluations on standard safety benchmark
bash scripts.sh
Evaluations against jailbreaks
For GCG attack:
bash scripts.sh
In scripts.sh, please specify the corresponding adv_string (adv_string_1: GCG-U1; adv_string_2: GCG-U2; adv_string_3: GCG-V; adv_string_4: GCG-L; adv_string_5: GCG-R)
For AutoDAN, TAP, PAIR attack:
bash jailbreak.sh