FlexGen icon indicating copy to clipboard operation
FlexGen copied to clipboard

Unable to run the benchmark

Open fungiboletus opened this issue 2 years ago • 0 comments

Hi,

I'm trying to run the benchmark bench_30b_1x4.sh (except that I set N_GPUS=2), but I get the following python exception:

rank #1: TypeError: sequence item 6: expected str instance, NoneType found
Traceback (most recent call last):
  File "/home/fungiboletus/miniconda3/envs/flexgen/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/fungiboletus/miniconda3/envs/flexgen/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/fungiboletus/flexgen/flexgen/dist_flex_opt.py", line 694, in <module>
    raise e
  File "/home/fungiboletus/flexgen/flexgen/dist_flex_opt.py", line 690, in <module>
    run_flexgen_dist(args)
  File "/home/fungiboletus/flexgen/flexgen/dist_flex_opt.py", line 620, in run_flexgen_dist
    outputs = tokenizer.batch_decode(output_ids, skip_special_tokens=True)
  File "/home/fungiboletus/miniconda3/envs/flexgen/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 3432, in batch_decode
    return [
  File "/home/fungiboletus/miniconda3/envs/flexgen/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 3433, in <listcomp>
    self.decode(
  File "/home/fungiboletus/miniconda3/envs/flexgen/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 3471, in decode
    return self._decode(
  File "/home/fungiboletus/miniconda3/envs/flexgen/lib/python3.10/site-packages/transformers/tokenization_utils.py", line 949, in _decode
    sub_texts.append(self.convert_tokens_to_string(current_sub_text))
  File "/home/fungiboletus/miniconda3/envs/flexgen/lib/python3.10/site-packages/transformers/models/gpt2/tokenization_gpt2.py", line 316, in convert_tokens_to_string
    text = "".join(tokens)
TypeError: sequence item 6: expected str instance, NoneType found
-------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. ...

I use Python 3.10.9 with Pytorch 1.13.1 with Cuda 11.7, and mpirun 2.1.1.

fungiboletus avatar Feb 21 '23 19:02 fungiboletus