Yutao ZHU

Results 27 comments of Yutao ZHU

Same here, is there any requirement for preprocessing?

The start token in decoder is used to generate the first word in response. There is no need to use a start token in encoder, since the encoder always receive...

For training, the softmax_cross_entropy_with_logits will compute softmax. For predicting, the argmax result is the same before or after using softmax.

@roomylee I guess u r right. Look at this implementation by Keras, https://github.com/Lsdefine/attention-is-all-you-need-keras/blob/master/transformer.py u can find the definiation for QKV in line 54-64, no bias and activation are used.

Is there any update for this issue? @brcsomnath

> Hi, I tried loading the llama model for inference and encountered some problems. I use 4 v100 GPUs with the model parallel size of 4 to load the llama...

Thx. I'm trying to continue training the model to see if the loss is correct. I will update my results here if it runs successfully.

> Hi @DaoD, specifically, I make the following changes in problem 1: > > ```python > def _get_all_zero_checkpoint_names(self, load_dir, tag, bf16_mode): > mp_rank = 0 if self.mpu is None else...

For config, you can just copy the configs/llama/7B.yml into the model settings part of configs/6-7B.yml. The running command is the same as training other models.

> Hi @wiio12 I don't see a LlamaTokenizer in the code. How do you perform inference and verify the results? I replace the SMPTokenizer by LlamaTokenizer by myself.