NabJa

Results 3 comments of NabJa

Is there any progress made on this question? :)

Hi @KumoLiu , thank you for the references! Indeed, the official PyTorch implementation splits the embeddings across all heads resulting in a head dimension of `embedding dimension // number heads`....

@marksgraham complete backward compatibility should be guaranteed with 1ccb5de43f936720d8fc82307d703f507682d135 . @ericspod DCO is updated and linting passes the checks.