scalable-oversight topic

List scalable-oversight repositories

ALaRM

25
Stars
3
Forks
25
Watchers

[ACL 2024] Code for the paper "ALaRM: Align Language Models via Hierarchical Rewards Modeling"

MathCritique

55
Stars
1
Forks
55
Watchers

Implementation for the research paper "Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision".