CWS Some question about the code and training time.

Thanks for you paper and code.

But I'm confused with some code.

In src/model.py

function get_score() -> function inner_one_step()

update_gate = T.exp(T.dot(ugW[:ln+nhiddens,an+ln-nhiddens:an+ln+ln],com)+ugb[an+ln-nhiddens:an+ln+ln]).reshape((len+1,nhiddens))

and in src/tools.py

function get_word()

update_gate = np.exp(np.dot(ugW[:ln+ndims,an+ln-ndims:an+ln+ln],com)+ugb[an+ln-ndims:an+ln+ln]).reshape((len+1,ndims))

I think they should be

update_gate = T.exp(T.dot(ugW[:ln+nhiddens,an+ln-nhiddens:an+ln+ln],com)+ugb[an+ln-nhiddens:an+ln+ln]).reshape((nhiddens, len+1)).transpose()

and

update_gate = np.exp(np.dot(ugW[:ln+ndims,an+ln-ndims:an+ln+ln],com)+ugb[an+ln-ndims:an+ln+ln]).reshape((ndims, len+1)).transpose()

Since the code

np.exp(np.dot(ugW[:ln+ndims,an+ln-ndims:an+ln+ln],com)+ugb[an+ln-ndims:an+ln+ln])

represent a vector

[e1_1, e1_2, ..., e1_ndims, e2_1, e2_2, ...,eln_1, eln_2, ..., eln_ndim]

where ei_j derived from the jth elem of ith character. So the reshape((nhiddens, len+1)) reshape the vector to

[e1_1, e1_2, ..., e1_ln,
e1_ln+1, e1_ln+2,...e1_ln+ln
...
eln_ndim-ln+1, ..., eln_ndim-1, eln_ndim]

which may supposed to be

[e1_1, e1_2, ..., e1_ndim,
e2_1, e2_2, ..., e2_ndim,
...
eln_1, eln_2, ..., eln_ndim]

In order to see the difference, I revised the code and ran it on CPU. But The memory occupation raised from 3GB(epoch 1) to 30GB(epoch 8). And it took 8000s per epoch.

The original code took 7000s per epoch and 2.6GB.

Did I misunderstand something?

Thanks!

Jul 04 '19 01:07 sevenights

I don't know if I have expressed it clearly. And my chinese is much better than English :D

Jul 04 '19 01:07 sevenights

@sevenights Hi,

I treat Theano as a legacy in deep learning and am not familiar with it anymore.

I would refer you to a better implementation there. It is with a modern library, i.e. dynet.

Jul 04 '19 02:07 jcyk

But in the greedyCWS, there is no need to calculate the update gate for word representation. In the paper

an update gate z (As in Figure 2), which has been shown helpless to the performance but requires uge computational cost according toour empirical study.

So I read the dy_model.py and get the same question.

update_gate = dy.transpose(dy.concatenate_cols([dy.softmax(dy.pickrange(update_logits,i*(wlen+1),(i+1)*(wlen+1))) for i in xrange(self.options['ndims'])]))

which I thought to be:

update_gate = dy.concatenate_cols([dy.softmax(dy.pickrange(update_logits,i*(self.options['ndims']+1),(i+1)*(self.options['ndims']+1))) for i in xrange(wlen)])

Jul 04 '19 15:07 sevenights