DeepSeek-R1-FineTuning
DeepSeek-R1-FineTuning copied to clipboard
Fine-Tuning of DeepSeek-Style Reasoning Models | RL + Quantization Implementation
Finetune Deepseek_R1_8b on _QuantumMechanics Dataset
- Base Model :
unsloth/DeepSeek-R1-Distill-Llama-8B - Training Dataset :
0xZee/dataset-CoT-Quantum-Mechanics-1224 - HF Finetuned Model :
0xZee/DeepSeek-R1-8b-ft-QuantumMechanics-CoT
Demo NoteBook