DRCN模型中5层lstm的stacked的输入疑问

Open chenmozxh opened this issue 6 years ago • 0 comments

我理解论文中公式6的意思是，第l层t时刻的输入为（1）第l-1层t时刻隐向量，（2）第l-1层的attention向量，（3）第l-1层t时刻的输入，三者contact起来为第l层t时刻的输入。而代码是如下：

` for j in range(5): with tf.variable_scope(f'p_lstm_{i}{j}', reuse=None): p_state, _ = self.BiLSTM(tf.concat(p_state, axis=-1)) with tf.variable_scope(f'p_lstm{i}_{j}' + str(i), reuse=None): h_state, _ = self.BiLSTM(tf.concat(h_state, axis=-1))

            p_state = tf.concat(p_state, axis=-1)
            h_state = tf.concat(h_state, axis=-1)
            # attention
            cosine = tf.divide(tf.matmul(p_state, tf.matrix_transpose(h_state)),
                               (tf.norm(p_state, axis=-1, keep_dims=True) * tf.norm(h_state, axis=-1, keep_dims=True)))
            att_matrix = tf.nn.softmax(cosine)
            p_attention = tf.matmul(att_matrix, h_state)
            h_attention = tf.matmul(att_matrix, p_state)

            # DesNet
            p = tf.concat((p, p_state, p_attention), axis=-1)
            h = tf.concat((h, h_state, h_attention), axis=-1)

所以，第j层的输入应该是p，而不是p_state 不知道我理解的对不对

还有一个细节，5层stacked的bilstm的输出，是要和原始字词的embedding拼接给到下一个5层stacked的bilstm？论文图1是这么画的，文字的话，好像没有提这一点论文中还有一个pooling结构，在4个5层bilstm后面，输出如果是（30,100）的话（30个词, 每个词的embedding是100维），则进行按列进行max-pooling成100维的p、q向量，然后进行公示7的拼接，在进行3层dense。

Nov 22 '19 12:11 chenmozxh