Sanqiang Zhao
Sanqiang Zhao
https://github.com/google-research/bert/blob/f39e881b169b9d53bea03d2d341b31707a6c052b/optimization.py#L65 Is there any special reason we add exclude_from_weight_decay for norm-related weight?
For your evaluation, I thought you use evaluation code from Joshua. They have both SARI and STAR, could I ask is there any special reason you use STAR rather than...
Based on Table 1 of your paper, I saw WikiSmall performance is much lower than WikiLarge. The table indicates three results are based on the different test set. I know...
Do you have any plans to release the trained models listed in the paper?
I tried to use Joshua decoder with the following script: -cand sari30it.test.output.1best -format plain -ref est.8turkers.tok.turk. -rps 8 -m SARI 4 test.8turkers.tok.norm -v 0 It seems only return SARI =...
I check the code of step function in AttentionDecoderCell. ` def step(x, states, weights): H = x h_tm1, c_tm1 = states W1, W2, W3, U, b1, b2, b3 = weights...