R-2-Guard
R-2-Guard copied to clipboard

kangmintong

→

Metadata

Code implementation of R^2-Guard: Robust Reasoning Enabled LLM Guardrail via Knowledge-Enhanced Logical Reasoning

Readme
Issues

Evaluations on standard safety benchmark

bash scripts.sh

Evaluations against jailbreaks

For GCG attack:

bash scripts.sh

In scripts.sh, please specify the corresponding adv_string (adv_string_1: GCG-U1; adv_string_2: GCG-U2; adv_string_3: GCG-V; adv_string_4: GCG-L; adv_string_5: GCG-R)

For AutoDAN, TAP, PAIR attack:

bash jailbreak.sh

Note: Since the evaluations are extensive, you may need to comment out the one to be evaluated. The scripts will be reorganized after the paper review.

About

Code implementation of R^2-Guard: Robust Reasoning Enabled LLM Guardrail via Knowledge-Enhanced Logical Reasoning

21

Stars

0

Forks

Watchers

Owner

kangmintong

← Metadata

21

Stars

0

Forks

Watchers

Owner

kangmintong

Metadata

Code implementation of R^2-Guard: Robust Reasoning Enabled LLM Guardrail via Knowledge-Enhanced Logical Reasoning

Back

R-2-Guard R-2-Guard copied to clipboard

Metadata

Evaluations on standard safety benchmark

Evaluations against jailbreaks

Note: Since the evaluations are extensive, you may need to comment out the one to be evaluated. The scripts will be reorganized after the paper review.

← Metadata

Owner

Metadata

R-2-Guard
R-2-Guard copied to clipboard