ykddd
Results
3
comments of
ykddd
Thank you for your reply! ^.^
I think this is a mistake in the paper too~
As defined in https://github.com/meta-llama/llama3/blob/main/llama/model.py hidden_dim is initialized to 4h,and then determined by ffn_dim_multiplier and multiple_of ``` hidden_dim = int(2 * hidden_dim / 3) # custom dim factor multiplier if ffn_dim_multiplier...