massive-activations icon indicating copy to clipboard operation
massive-activations copied to clipboard

the standard deviation of the activation

Open Cooperx521 opened this issue 1 year ago • 3 comments

Hello, I am interested in the standard deviation of the activation and would like to know how the variance is calculated. Here are a few methods:

  1. Calculate the variance for 100 sequences and display it for a specific layer in the table below.
  2. Calculate the variance for 100 sequences and the layers with relatively large values (e.g., layers 2-30).
  3. Calculate the variance for all layers.

Could you please specify which of the above situations applies?

Thanks. image

Cooperx521 avatar Jul 01 '24 12:07 Cooperx521

Thanks for your interest in our work. That would be option 1. This table shows the activation deviation within a fixed layer.

Eric-mingjie avatar Jul 01 '24 17:07 Eric-mingjie

Thanks a lot. So we just calculate the standard deviation of 100 values. Take the top 1 as an example: it might be the 2533rd dimension of the starting token in the 15th layer. We collect 100 such values and then compute the standard deviation.

Cooperx521 avatar Jul 02 '24 03:07 Cooperx521

Yes, that's correct.

Eric-mingjie avatar Jul 03 '24 16:07 Eric-mingjie