wanda
wanda copied to clipboard
OPT-66B, unstructured sparsity gets wikitext perplexity 3404.0751953125
Hello, I used the scripts to prune the OPT-66B. (Unstructured, n_samples 128) Upon with, I get a wikitext perplexity of 3404, which is way off the metric given in the paper.
I was wondering if the code output metric should be scaled by 0.01, (thus 3.404 perplexity) Or if this is an outlier performance.
This seems to be an outlier performance, which i get before from running on OPT-66B. I wasn't able to look into this (mainly because LLaMA and LLaMA2 is much more popular), but it would be interesting to study why this is the case from a scientific perspective.