lucywang720
lucywang720
we are now reproducing this paper, but we are confused about r_A in this paper. May I ask how to calculate r_A with each-step reward produced by PRM? I would...
About your paper: Pruning for Protection: Increasing Jailbreak Resistance in Aligned LLMs Without Fine-Tuning which dataset do you use for generate a pruned model?
Hi, thank you for your great work! However, I used the three policies you provided in https://diffusion-policy.cs.columbia.edu/data/experiments/low_dim/square_mh/diffusion_policy_cnn/ for testing on the square task (not the last checkpoint), but the results...