xFasterTransformer
xFasterTransformer copied to clipboard
performance issue for opt-1.3b with BS=1 BF16
test opt 1.3 model on EMR platform with 52c. the performance is not right with BS=1. the gap between BS=1 and BS =2 is too big.
numactl -C 0-51 -m 0 ./run_benchmark.sh -m opt-1_3b -d bf16 -s 1 -bs 1 -in 128 -out 15 -i 10
the results
BS=1
BS=2
@bin1guo do we still need to benchmark OPT model? Suggest to run the llama model.