xtuner icon indicating copy to clipboard operation
xtuner copied to clipboard

[feat] use HF state_dict in RL update weights

Open CyCle1024 opened this issue 2 months ago • 1 comments

We use load_spec generator in BaseModel to generate severl state_dict ( device tensor ) for RT update weights instead of the old layer by layer method. It's a more generally way to support more models.

CyCle1024 avatar Nov 13 '25 09:11 CyCle1024

@HAOCHENYE @HIT-cwh Please review xtuner/v1/model/base.py carefully, as the change here may break HF saving. (Although I think this PR would not break)

CyCle1024 avatar Nov 13 '25 09:11 CyCle1024