DeepCTR icon indicating copy to clipboard operation
DeepCTR copied to clipboard

请教下大佬几个deepctr din模型的问题

Open nmslandwsnd opened this issue 4 years ago • 3 comments

Describe the question(问题描述) 【1】训练完的模型(称为model1)model.save保存之后用tensorflow.python.keras.models.load_model进行加载(称为model2),model2.input的长度比model1.input的长度少了一维,少的一维是做attention的序列的长度,请问这是为什么呀 特征定义示例:

feature_columns = [SparseFeat('user', 3, embedding_dim=10, use_hash=Ture),
                   SparseFeat('gender', 2, embedding_dim=4),
                   SparseFeat('item_id', 3 + 1, embedding_dim=8),
                   SparseFeat('cate_id', 2 + 1, embedding_dim=4, use_hash=Ture),
                   DenseFeat('pay_score', 1)]
feature_columns += [
    VarLenSparseFeat(SparseFeat('hist_item_id', vocabulary_size=3 + 1, embedding_dim=8, embedding_name='item_id'),
                     maxlen=4, length_name="seq_length"),
    VarLenSparseFeat(SparseFeat('hist_cate_id', 2 + 1, embedding_dim=4, embedding_name='cate_id'),
                     maxlen=4, length_name="seq_length")]
behavior_feature_list = ["item_id", "cate_id"]

这样定义特征训练的话 model1.input.name: user, gender, item_id, cate_id, pay_score, hist_item_id, hist_cate_id, seq_length model2.input.name: user, gender, item_id, cate_id, pay_score, hist_item_id, hist_cate_id

【2】当定义做attention的序列use_hash属性为True时,保存模型再加载,报错: 特征定义示例:

feature_columns = [SparseFeat('user', 3, embedding_dim=10, use_hash=Ture),
                   SparseFeat('gender', 2, embedding_dim=4),
                   SparseFeat('item_id', 3 + 1, embedding_dim=8),
                   SparseFeat('cate_id', 2 + 1, embedding_dim=4, use_hash=Ture),
                   DenseFeat('pay_score', 1)]
feature_columns += [
    VarLenSparseFeat(SparseFeat('hist_item_id', vocabulary_size=3 + 1, embedding_dim=8, embedding_name='item_id', use_hash=Ture),
                     maxlen=4, length_name="seq_length"),
    VarLenSparseFeat(SparseFeat('hist_cate_id', 2 + 1, embedding_dim=4, embedding_name='cate_id', use_hash=Ture),
                     maxlen=4, length_name="seq_length")]
behavior_feature_list = ["item_id", "cate_id"]

错误日志:

File "/Users/anaconda3/lib/python3.6/site-packages/tensorflow/python/keras/layers/merge.py", line 392, in build
    'Got inputs shapes: %s' % (input_shape))
ValueError: A `Concatenate` layer requires inputs with matching shapes except for the concat axis. Got inputs shapes: [(None, 1, 10), (None, 1, 4), (None, 1, 4), (None, 4, 4), (None, 4, 4)]

这个问题我把deepctr/models/sequence/din.py源码的第86行 deep_input_emb = tf.keras.layers.Concatenate()([NoMask()(deep_input_emb), hist]) 修改为 deep_input_emb = tf.keras.layers.Concatenate(axis=-1)([(deep_input_emb), hist]) 之后,不再报错,不知道这样修改会不会存在什么问题

【3】当定义非历史行为的序列(不做attention,直接pooling之后和其他输入特征一起concat接到全连接层)时,如果非历史行为序列的embedding_name和某个其他输入特征的embedding_name重复时,save模型之后再加载,同样报错:

特征定义示例:

feature_columns = [SparseFeat('user', 3, embedding_dim=10, use_hash=Ture),
                   SparseFeat('gender', 2, embedding_dim=4),
                   SparseFeat('item_id', 3 + 1, embedding_dim=8),
                   SparseFeat('cate_id', 2 + 1, embedding_dim=4, use_hash=Ture),
                   DenseFeat('pay_score', 1)]
feature_columns += [
    VarLenSparseFeat(SparseFeat('hist_item_id', vocabulary_size=3 + 1, embedding_dim=8, embedding_name='item_id', use_hash=Ture),
                     maxlen=4, length_name="seq_length"),
    VarLenSparseFeat(SparseFeat('hist_cate_id', 2 + 1, embedding_dim=4, embedding_name='cate_id', use_hash=Ture),
                     maxlen=4, length_name="seq_length"),
    VarLenSparseFeat(SparseFeat('hist_buy_cate_id', 2 + 1, embedding_dim=4, embedding_name='cate_id', use_hash=Ture),
                     maxlen=2, length_name="seq_length_buy")]
behavior_feature_list = ["item_id", "cate_id"]

错误日志:

File "/Users/anaconda3/lib/python3.6/site-packages/tensorflow/python/keras/layers/merge.py", line 392, in build
    'Got inputs shapes: %s' % (input_shape))
ValueError: A `Concatenate` layer requires inputs with matching shapes except for the concat axis. Got inputs shapes: [(None, 4, 4), (None, 1, 4)]

当非历史行为序列的embedding_name和所有其他输入特征均不相同时,加载模型不报错 不报错特征定义示例: 特征定义示例(把第三个序列的embedding_name改成了'cate_id_buy'):

feature_columns = [SparseFeat('user', 3, embedding_dim=10, use_hash=Ture),
                   SparseFeat('gender', 2, embedding_dim=4),
                   SparseFeat('item_id', 3 + 1, embedding_dim=8),
                   SparseFeat('cate_id', 2 + 1, embedding_dim=4, use_hash=Ture),
                   DenseFeat('pay_score', 1)]
feature_columns += [
    VarLenSparseFeat(SparseFeat('hist_item_id', vocabulary_size=3 + 1, embedding_dim=8, embedding_name='item_id', use_hash=Ture),
                     maxlen=4, length_name="seq_length"),
    VarLenSparseFeat(SparseFeat('hist_cate_id', 2 + 1, embedding_dim=4, embedding_name='cate_id', use_hash=Ture),
                     maxlen=4, length_name="seq_length"),
    VarLenSparseFeat(SparseFeat('hist_buy_cate_id', 2 + 1, embedding_dim=4, embedding_name='cate_id_buy', use_hash=Ture),
                     maxlen=2, length_name="seq_length_buy")]
behavior_feature_list = ["item_id", "cate_id"]

Operating environment(运行环境):

  • python version [3.6]
  • tensorflow version [1.13.1, 1.14.0, [2.1.0] 均出现问题
  • deepctr version [0.9.0]

nmslandwsnd avatar Nov 01 '21 14:11 nmslandwsnd

@nmslandwsnd 遇到了同样的问题~你现在解决了吗

kummar avatar Mar 18 '22 06:03 kummar

第一个问题没解决,剩下两个解决了

---原始邮件--- 发件人: @.> 发送时间: 2022年3月18日(周五) 下午2:24 收件人: @.>; 抄送: @.@.>; 主题: Re: [shenweichen/DeepCTR] 请教下大佬几个deepctr din模型的问题 (Issue #424)

@nmslandwsnd 遇到了同样的问题~你现在解决了吗

— Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you were mentioned.Message ID: @.***>

nmslandwsnd avatar Mar 21 '22 10:03 nmslandwsnd

ly 请问这里NoMask有什么作用啊,去掉会不会有影响?

pjgao avatar Apr 27 '22 01:04 pjgao