Some question about the code and training time.
Thanks for you paper and code.
But I'm confused with some code.
In src/model.py
function get_score() -> function inner_one_step()
update_gate = T.exp(T.dot(ugW[:ln+nhiddens,an+ln-nhiddens:an+ln+ln],com)+ugb[an+ln-nhiddens:an+ln+ln]).reshape((len+1,nhiddens))
and in src/tools.py
function get_word()
update_gate = np.exp(np.dot(ugW[:ln+ndims,an+ln-ndims:an+ln+ln],com)+ugb[an+ln-ndims:an+ln+ln]).reshape((len+1,ndims))
I think they should be
update_gate = T.exp(T.dot(ugW[:ln+nhiddens,an+ln-nhiddens:an+ln+ln],com)+ugb[an+ln-nhiddens:an+ln+ln]).reshape((nhiddens, len+1)).transpose()
and
update_gate = np.exp(np.dot(ugW[:ln+ndims,an+ln-ndims:an+ln+ln],com)+ugb[an+ln-ndims:an+ln+ln]).reshape((ndims, len+1)).transpose()
Since the code
np.exp(np.dot(ugW[:ln+ndims,an+ln-ndims:an+ln+ln],com)+ugb[an+ln-ndims:an+ln+ln])
represent a vector
[e1_1, e1_2, ..., e1_ndims, e2_1, e2_2, ...,eln_1, eln_2, ..., eln_ndim]
where ei_j derived from the jth elem of ith character. So the reshape((nhiddens, len+1)) reshape the vector to
[e1_1, e1_2, ..., e1_ln,
e1_ln+1, e1_ln+2,...e1_ln+ln
...
eln_ndim-ln+1, ..., eln_ndim-1, eln_ndim]
which may supposed to be
[e1_1, e1_2, ..., e1_ndim,
e2_1, e2_2, ..., e2_ndim,
...
eln_1, eln_2, ..., eln_ndim]
In order to see the difference, I revised the code and ran it on CPU. But The memory occupation raised from 3GB(epoch 1) to 30GB(epoch 8). And it took 8000s per epoch.
The original code took 7000s per epoch and 2.6GB.
Did I misunderstand something?
Thanks!
I don't know if I have expressed it clearly. And my chinese is much better than English :D
@sevenights Hi,
I treat Theano as a legacy in deep learning and am not familiar with it anymore.
I would refer you to a better implementation there. It is with a modern library, i.e. dynet.
But in the greedyCWS, there is no need to calculate the update gate for word representation. In the paper
an update gate z (As in Figure 2), which has been shown helpless to the performance but requires uge computational cost according toour empirical study.
So I read the dy_model.py and get the same question.
update_gate = dy.transpose(dy.concatenate_cols([dy.softmax(dy.pickrange(update_logits,i*(wlen+1),(i+1)*(wlen+1))) for i in xrange(self.options['ndims'])]))
which I thought to be:
update_gate = dy.concatenate_cols([dy.softmax(dy.pickrange(update_logits,i*(self.options['ndims']+1),(i+1)*(self.options['ndims']+1))) for i in xrange(wlen)])