DBRL issues

Performance of DDPG and REINFORCE

2

I cannot get increasing rewards on REINFORCE and DDPG algorithms. Is this normal? Can you provide the final results of the three algorithms? Thank you!

weilonghu

请问这是Deep Reinforcement Learning for List-wise Recommendations 这篇论文的代码吗

15

Daming0508

大佬，请问对于用户特征的csv文件我是否可以换成我自己的数据？如果特征维数高是否对结果有影响呢？

rbdxyxk

@massquantity In your model, DSSM stands for ['Deep Semantic Similarity Model'?](https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/cikm2013_DSSM_fullversion.pdf?ranMID=24542&ranEAID=je6NUbpObpQ&ranSiteID=je6NUbpObpQ-BMdRIlsEDsUayC5Vz5Is2A&epi=je6NUbpObpQ-BMdRIlsEDsUayC5Vz5Is2A&irgwc=1&OCID=AIDcmm549zy227_aff_7593_1243925&tduid=%28ir__hup0ixhs6okfdkndmiw3rikqdv2xb31zr1cmanbi00%29%287593%29%281243925%29%28je6NUbpObpQ-BMdRIlsEDsUayC5Vz5Is2A%29%28%29&irclickid=_hup0ixhs6okfdkndmiw3rikqdv2xb31zr1cmanbi00) And I would like to know why that model is chosen as the embedding model for users and items...

shyun46

损失很大

8

@massquantity 网络损失从第三轮左右就变得非常大，ndcg一直很小，你的运行结果也是这样吗，不知道哪里的问题 ![image](https://user-images.githubusercontent.com/96457748/229278761-f623c220-6433-4c28-962e-2c8cf20f5a06.png)

EVAKKKK

随着epoch的增加，损失值从epoch3开始突然增大，不知道什么原因，请作者解答一下

Daming0508

运行报错

10

在运行run_pretrain_embeddings.py时报错Expected tensor for argument #1 'indices' to have scalar type Long; but got torch.cuda.IntTensor instead (while checking arguments for embedding)

EVAKKKK

请问在BCQ模型代码中，求当前状态q值的时候为什么没有使用训练样本中的action？

s = self.generator.get_state(data).detach() gen_actions = self.generator.decode(s) a = self.perturbator(s, gen_actions).detach() # a = self.item_embeds[data["action"]] #bcq.py 文件第156行，按照bcq原始算法讲解，这里应该求q值的时候应该使用训练样本中的 s和a，在求y值时候才使用next_state 预估的新的next_action.想知道是什么是什么原因这里注释掉了？ q1 = self.critic1(s, a) q2 = self.critic2(s, a)

huangpingchun

DBRL
DBRL copied to clipboard

Metadata

Performance of DDPG and REINFORCE

无法获取数据集求分享

请问这是Deep Reinforcement Learning for List-wise Recommendations 这篇论文的代码吗

大佬，请问对于用户特征的csv文件我是否可以换成我自己的数据？如果特征维数高是否对结果有影响呢？

About DSSM

损失很大

随着epoch的增加，损失值从epoch3开始突然增大，不知道什么原因，请作者解答一下

运行报错

请问在BCQ模型代码中，求当前状态q值的时候为什么没有使用训练样本中的action？

← Metadata

Owner

Metadata

DBRL DBRL copied to clipboard

Metadata

← Metadata

Owner

Metadata

DBRL
DBRL copied to clipboard