post-training topic
CPT
[EMNLP 2022] Continual Training of Language Models for Few-Shot Learning
FastVideo
A unified inference and post-training framework for accelerated video generation.
Science-T2I
[CVPR 2025] Science-T2I: Addressing Scientific Illusions in Image Synthesis
learning-from-rewards-llm-papers
A comrephensive collection of learning from rewards in the post-training and test-time scaling of LLMs, with a focus on both reward models and learning strategies across training, inference, and post-...
VisualThinker-R1-Zero
Explore the Multimodal “Aha Moment” on 2B Model
24-Game-Reasoning
超简单复现Deepseek-R1-Zero和Deepseek-R1,以「24点游戏」为例。通过zero-RL、SFT以及SFT+RL,以激发LLM的自主验证反思能力。 About Clean, minimal, accessible reproduction of DeepSeek R1-Zero, DeepSeek R1
meow-tea-taro
A Practitioner's Guide to M(eow)ti Turn Agentic ReinfOrcement learning
Re-Align
[EMNLP'25] A novel alignment framework that leverages image retrieval to mitigate hallucinations in Vision Language Models.
Logic-RL-Lite
Lightweight replication study of DeepSeek-R1-Zero. Interesting findings include "No Aha Moment", "Longer CoT ≠ Accuracy", and "Language Mixing in Instruct Models".
DeepEnlighten
Pure RL to post-train base models for social reasoning capabilities. Lightweight replication of DeepSeek-R1-Zero with Social IQa dataset.