benbench
benbench copied to clipboard
Benchmarking Benchmark Leakage in Large Language Models
orgn-GSM8K-train Average_ppl_accuracy: nan PPL of xxxx GSM8K_rewritten-test-1: nan GSM8K_rewritten-test-2: nan GSM8K_rewritten-test-3: nan GSM8K_rewritten-train-1: nan GSM8K_rewritten-train-2: nan GSM8K_rewritten-train-3: nan orgn-GSM8K-test: nan orgn-GSM8K-train: nan why this happens?
Are the ngram and ppl scripts currently capable of leveraging multi-GPU setups for inference? If not, are there any planned updates or workarounds that might support this feature? Additionally, is...
error: RPC failed; curl 92 HTTP/2 stream 0 was not closed cleanly: CANCEL (err 8) error: 5929 bytes of body are still expected fetch-pack: unexpected disconnect while reading sideband packet...