ykddd

Results 3 comments of ykddd

I think this is a mistake in the paper too~

As defined in https://github.com/meta-llama/llama3/blob/main/llama/model.py hidden_dim is initialized to 4h,and then determined by ffn_dim_multiplier and multiple_of ``` hidden_dim = int(2 * hidden_dim / 3) # custom dim factor multiplier if ffn_dim_multiplier...