DBRL icon indicating copy to clipboard operation
DBRL copied to clipboard

Dataset Batch(offline) Reinforcement Learning for recommender system

Results 9 DBRL issues
Sort by recently updated
recently updated
newest added

I cannot get increasing rewards on REINFORCE and DDPG algorithms. Is this normal? Can you provide the final results of the three algorithms? Thank you!

@massquantity In your model, DSSM stands for ['Deep Semantic Similarity Model'?](https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/cikm2013_DSSM_fullversion.pdf?ranMID=24542&ranEAID=je6NUbpObpQ&ranSiteID=je6NUbpObpQ-BMdRIlsEDsUayC5Vz5Is2A&epi=je6NUbpObpQ-BMdRIlsEDsUayC5Vz5Is2A&irgwc=1&OCID=AIDcmm549zy227_aff_7593_1243925&tduid=%28ir__hup0ixhs6okfdkndmiw3rikqdv2xb31zr1cmanbi00%29%287593%29%281243925%29%28je6NUbpObpQ-BMdRIlsEDsUayC5Vz5Is2A%29%28%29&irclickid=_hup0ixhs6okfdkndmiw3rikqdv2xb31zr1cmanbi00) And I would like to know why that model is chosen as the embedding model for users and items...

@massquantity 网络损失从第三轮左右就变得非常大,ndcg一直很小,你的运行结果也是这样吗,不知道哪里的问题 ![image](https://user-images.githubusercontent.com/96457748/229278761-f623c220-6433-4c28-962e-2c8cf20f5a06.png)

在运行run_pretrain_embeddings.py时报错Expected tensor for argument #1 'indices' to have scalar type Long; but got torch.cuda.IntTensor instead (while checking arguments for embedding)

s = self.generator.get_state(data).detach() gen_actions = self.generator.decode(s) a = self.perturbator(s, gen_actions).detach() # a = self.item_embeds[data["action"]] #bcq.py 文件第156行,按照bcq原始算法讲解,这里应该求q值的时候应该使用训练样本中的 s和a,在求y值时候才使用next_state 预估的新的next_action.想知道是什么是什么原因这里注释掉了? q1 = self.critic1(s, a) q2 = self.critic2(s, a)