scalable-oversight topic
List
scalable-oversight repositories
ALaRM
25
Stars
3
Forks
25
Watchers
[ACL 2024] Code for the paper "ALaRM: Align Language Models via Hierarchical Rewards Modeling"
MathCritique
55
Stars
1
Forks
55
Watchers
Implementation for the research paper "Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision".