NabJa
Results
3
comments of
NabJa
Is there any progress made on this question? :)
Hi @KumoLiu , thank you for the references! Indeed, the official PyTorch implementation splits the embeddings across all heads resulting in a head dimension of `embedding dimension // number heads`....
@marksgraham complete backward compatibility should be guaranteed with 1ccb5de43f936720d8fc82307d703f507682d135 . @ericspod DCO is updated and linting passes the checks.