Yutao ZHU comments

Results 27 comments of


                                            Yutao ZHU

Loss NAN

Same here, is there any requirement for preprocessing?

why encoder input don't have start token <S>

The start token in decoder is used to generate the first word in response. There is no need to use a start token in encoder, since the encoder always receive...

where final output has no softmax

For training, the softmax_cross_entropy_with_logits will compute softmax. For predicting, the argmax result is the same before or after using softmax.

Linear transform with bias at multi-head attention

@roomylee I guess u r right. Look at this implementation by Keras, https://github.com/Lsdefine/attention-is-all-you-need-keras/blob/master/transformer.py u can find the definiation for QKV in line 54-64, no bias and activation are used.

Request for pretrained model

Is there any update for this issue? @brcsomnath

Problems on generating with llama model

> Hi, I tried loading the llama model for inference and encountered some problems. I use 4 v100 GPUs with the model parallel size of 4 to load the llama...

Problems on generating with llama model

Thx. I'm trying to continue training the model to see if the loss is correct. I will update my results here if it runs successfully.

Problems on generating with llama model

> Hi @DaoD, specifically, I make the following changes in problem 1: > > ```python > def _get_all_zero_checkpoint_names(self, load_dir, tag, bf16_mode): > mp_rank = 0 if self.mpu is None else...

Problems on generating with llama model

For config, you can just copy the configs/llama/7B.yml into the model settings part of configs/6-7B.yml. The running command is the same as training other models.

Problems on generating with llama model

> Hi @wiio12 I don't see a LlamaTokenizer in the code. How do you perform inference and verify the results? I replace the SMPTokenizer by LlamaTokenizer by myself.