Medusa icon indicating copy to clipboard operation
Medusa copied to clipboard

Are Medusa Heads computed in parallel or serially?

Open userljz opened this issue 1 year ago • 0 comments

Hello authors,

While reading your code, I noticed that the multiple Medusa Heads you proposed are computing results in parallel

for i in range(self.medusa):
    medusa_logits.append(self.medusa_head[i](hidden_states))

(although the later Heads don't use the results from the previous Heads, the results are obtained using a for loop).

I'm wondering if I've misunderstood this, or if Medusa is currently using serially obtained results?

Could you please clarify this for me?

userljz avatar Aug 03 '24 00:08 userljz