TextBox icon indicating copy to clipboard operation
TextBox copied to clipboard

[🐛BUG]我在使用mBART模型和WMT19zh-en时碰到问题。

Open 01vanilla opened this issue 2 years ago • 2 comments

描述这个 bug 我在使用mBART模型和WMT19zh-en时碰到以下问题。

如何复现 run_textbox.py --model=mBART --model_path=facebook/mbart-large-cc25 --dataset=wmt19-zh-en --src_lang=zh_CN --tgt_lang=en_XX

日志 23 Apr 00:43 INFO Pretrain type: pretrain disabled :1: SyntaxWarning: 'int' object is not callable; perhaps you missed a comma? :1: SyntaxWarning: 'int' object is not callable; perhaps you missed a comma? :1: SyntaxWarning: 'int' object is not callable; perhaps you missed a comma? :1: SyntaxWarning: 'int' object is not callable; perhaps you missed a comma? :1: SyntaxWarning: 'int' object is not callable; perhaps you missed a comma? :1: SyntaxWarning: 'int' object is not callable; perhaps you missed a comma? :1: SyntaxWarning: 'int' object is not callable; perhaps you missed a comma? :1: SyntaxWarning: 'int' object is not callable; perhaps you missed a comma? :1: SyntaxWarning: 'int' object is not callable; perhaps you missed a comma? :1: SyntaxWarning: 'int' object is not callable; perhaps you missed a comma? :1: SyntaxWarning: 'int' object is not callable; perhaps you missed a comma? :1: SyntaxWarning: list indices must be integers or slices, not tuple; perhaps you missed a comma? :1: SyntaxWarning: 'str' object is not callable; perhaps you missed a comma? :1: SyntaxWarning: list indices must be integers or slices, not tuple; perhaps you missed a comma? :1: SyntaxWarning: list indices must be integers or slices, not tuple; perhaps you missed a comma? :1: SyntaxWarning: list indices must be integers or slices, not tuple; perhaps you missed a comma? :1: SyntaxWarning: list indices must be integers or slices, not tuple; perhaps you missed a comma? :1: SyntaxWarning: list indices must be integers or slices, not tuple; perhaps you missed a comma? :1: SyntaxWarning: list indices must be integers or slices, not tuple; perhaps you missed a comma? :1: SyntaxWarning: list indices must be integers or slices, not tuple; perhaps you missed a comma? :1: SyntaxWarning: list indices must be integers or slices, not tuple; perhaps you missed a comma? :1: SyntaxWarning: list indices must be integers or slices, not tuple; perhaps you missed a comma? :1: SyntaxWarning: list indices must be integers or slices, not tuple; perhaps you missed a comma? :1: SyntaxWarning: list indices must be integers or slices, not tuple; perhaps you missed a comma? :1: SyntaxWarning: list indices must be integers or slices, not tuple; perhaps you missed a comma? Token indices sequence length is longer than the specified maximum sequence length for this model (1776 > 1024). Running this sequence through the model will result in indexing errors Traceback (most recent call last): File "run_textbox.py", line 15, in run_textbox(model=args.model, dataset=args.dataset, config_file_list=args.config_files, config_dict={}) File "/hy-tmp/TextBox/textbox/quick_start/quick_start.py", line 20, in run_textbox experiment = Experiment(model, dataset, config_file_list, config_dict) File "/hy-tmp/TextBox/textbox/quick_start/experiment.py", line 56, in init self._init_data(self.get_config(), self.accelerator) File "/hy-tmp/TextBox/textbox/quick_start/experiment.py", line 82, in _init_data train_data, valid_data, test_data = data_preparation(config, tokenizer) File "/hy-tmp/TextBox/textbox/data/utils.py", line 24, in data_preparation train_dataset.tokenize(tokenizer) File "/hy-tmp/TextBox/textbox/data/abstract_dataset.py", line 120, in tokenize ids = tokenizer( File "/usr/local/miniconda3/envs/TextBox/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 2538, in call encodings = self._call_one(text=text, text_pair=text_pair, **all_kwargs) File "/usr/local/miniconda3/envs/TextBox/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 2624, in _call_one return self.batch_encode_plus( File "/usr/local/miniconda3/envs/TextBox/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 2815, in batch_encode_plus return self._batch_encode_plus( File "/usr/local/miniconda3/envs/TextBox/lib/python3.8/site-packages/transformers/tokenization_utils_fast.py", line 428, in _batch_encode_plus encodings = self._tokenizer.encode_batch( TypeError: TextEncodeInput must be Union[TextInputSequence, Tuple[InputSequence, InputSequence]]

其中,我使用的transformers版本为4.28.1,torch版本为2.0.0+cu117

01vanilla avatar Apr 22 '23 16:04 01vanilla

你可以临时注释 https://github.com/RUCAIBox/TextBox/blob/2.0.0/textbox/data/misc.py 中的27~34行,我们之后会尽快修复

StevenTang1998 avatar Apr 24 '23 15:04 StevenTang1998

如果有问题欢迎继续提问

StevenTang1998 avatar Apr 30 '23 03:04 StevenTang1998