CTR_Algorithm 模型中编码问题

DeepFM中的编码是采用的one-hot编码方式，作者在此处用了lable编码方式。这种编码方式很少用，且局限性很强。这种编码方式确实在embedding的时候非常方便。但是这种编码方式最后产生的实际效果是不是可信度不强呢

Mar 19 '22 08:03 Museens

是一样的， pytorch也好， tensorflow也好，用的是label编码的方式作为输入的原因之一是，one-hot会有大量内存消耗，而输入label encode + 查表得方式计算得embedding 与 one-hot后与embedding得weights相乘得到得结果会是一样的。 eg :

x = torch.LongTensor([0,1,2,3])

x_one_hot = nn.functional.one_hot(x)

embedding = nn.Embedding(4, 2)

embedding(x)

tensor([[-0.5853, -0.4129], [ 1.4309, 2.0298], [ 0.8416, 0.2533], [ 0.6935, 1.2494]], grad_fn=<EmbeddingBackward>)

torch.matmul(x_one_hot.float(), embedding.weight.data)

tensor([[-0.5853, -0.4129], [ 1.4309, 2.0298], [ 0.8416, 0.2533], [ 0.6935, 1.2494]])

Mar 21 '22 02:03 Prayforhanluo

这下终于明白了，感谢大佬的回复。

Mar 28 '22 02:03 Museens