luomuqinghan
luomuqinghan
Thanks for your sharing. In RL for ease of answering, the reward is calculated by RL model itself, not another model? Why not input the action into another pretrained model...
Thanks for your sharing. In RL for ease of answering, the reward is calculated by RL model itself, not another model? Why not input the action into another pretrained model...
Thank you for your code! But in original HRED, for k context and one response, HRED generates k utterances. It seems that you only generate the final response in training...
Would you mind sharing more personality data?