DBRL
DBRL copied to clipboard
Dataset Batch(offline) Reinforcement Learning for recommender system
I cannot get increasing rewards on REINFORCE and DDPG algorithms. Is this normal? Can you provide the final results of the three algorithms? Thank you!
@massquantity In your model, DSSM stands for ['Deep Semantic Similarity Model'?](https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/cikm2013_DSSM_fullversion.pdf?ranMID=24542&ranEAID=je6NUbpObpQ&ranSiteID=je6NUbpObpQ-BMdRIlsEDsUayC5Vz5Is2A&epi=je6NUbpObpQ-BMdRIlsEDsUayC5Vz5Is2A&irgwc=1&OCID=AIDcmm549zy227_aff_7593_1243925&tduid=%28ir__hup0ixhs6okfdkndmiw3rikqdv2xb31zr1cmanbi00%29%287593%29%281243925%29%28je6NUbpObpQ-BMdRIlsEDsUayC5Vz5Is2A%29%28%29&irclickid=_hup0ixhs6okfdkndmiw3rikqdv2xb31zr1cmanbi00) And I would like to know why that model is chosen as the embedding model for users and items...
损失很大
@massquantity 网络损失从第三轮左右就变得非常大,ndcg一直很小,你的运行结果也是这样吗,不知道哪里的问题 
运行报错
在运行run_pretrain_embeddings.py时报错Expected tensor for argument #1 'indices' to have scalar type Long; but got torch.cuda.IntTensor instead (while checking arguments for embedding)
s = self.generator.get_state(data).detach() gen_actions = self.generator.decode(s) a = self.perturbator(s, gen_actions).detach() # a = self.item_embeds[data["action"]] #bcq.py 文件第156行,按照bcq原始算法讲解,这里应该求q值的时候应该使用训练样本中的 s和a,在求y值时候才使用next_state 预估的新的next_action.想知道是什么是什么原因这里注释掉了? q1 = self.critic1(s, a) q2 = self.critic2(s, a)