wanda Cannot reproduce Llama2 results

Hello, I'm opening this issue because I'm still having problems with reproducing the llama 2-7b results (both without pruning and using wanda). Here are my intermediate and final perplexity results with the dense model (with context size 4096). It seems like the last few samples are somehow messing up the perplexity but I don't know why. Any help would be appreciated. nsamples 333 sample 50, Perplexity 5.0264153480529785 sample 100, Perplexity 5.311441421508789 sample 150, Perplexity 5.710564136505127 sample 200, Perplexity 5.612466335296631 sample 250, Perplexity 5.526543617248535 sample 300, Perplexity 6.8109965324401855 wikitext perplexity 7.72459077835083

May 09 '24 16:05 taratt

I recalled that there shouldn't be 333 samples for the wikitext, actually much less than that (in my case it is 83). Are you using the validation set?

May 09 '24 17:05 Eric-mingjie

I am using the same testenc that the function get_wikitext2 in data.py is returning. If the model's sequence length is 4096, does this mean that I'm somehow getting more samples?

May 09 '24 17:05 taratt

Correct, 333 does not looks like the right number from what i am seeing on my end; and i was referring to the test split, sorry for the confusion.

May 09 '24 18:05 Eric-mingjie

Thanks to your tip I was able to figure out what the problem was. I was testing over wikitext103 instead of wikitext2. The version of datasets suggested in your install file automatically loads wikitext103 instead of wikitext2. I suggest you update it. Thanks again.

May 09 '24 21:05 taratt

Great, thank you for the update.

May 12 '24 02:05 Eric-mingjie