trouble with bert-based model
Dear Dr.Arazd @arazd ,
Thanks for your great work. I'm trying to replicate your result in Table.1 with the order 4 (5 tasks - bert base-uncased model) - CL setting (full data) in the main paper with the following cmd:
python train_cl2.py --task_list ag yelp_review_full amazon yahoo dbpedia --prefix_MLP residual_MLP2 --lr 1e-4 --num_epochs 40 --freeze_weights 1 --freeze_except word_embeddings \ --prompt_tuning 1 --prefix_len 20 --seq_len 450 --one_head 0 \ --model_name bert-base-uncased --early_stopping 1 \ --save_name BERT_order_4_run1 --save_dir ./results
However, when the progressive prompt model evaluate the accuracy on all dataset, it thrown an error like this when it started evaluating on yahoo dataset:
/opt/conda/conda-bld/pytorch_1639180487213/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [60,0,0], thread: [96,0,0] Assertion srcIndex < srcSelectDimSize failed.
This issue only appears in the evaluation of the 4th task, I tried other settings such as shorter sequence (2, 3 tasks), removing ResMLP, ... but it works as normal. I also tried to print the pattern of the input ids, token_type_ids, and position_ids but the 4th task's pattern is similar to previous tasks.
the error stems from this line in your repo: https://github.com/arazd/ProgressivePrompts/blob/01572d6a73c0576b070ceee00dbe4f5bc278423f/BERT_codebase/model_utils.py#L576
Could you give me some insight about this problem? I really appreciate if you can help me to fix that.
P/s: After troubleshooting the source of those problem, I found that it only occur when the sequence length longer than 4, and evaluating with full validation set.
Hi,
after troubleshooting the problem, I found that this issue stems from the embedding size of bert-base-uncased model. In particular, if we set the seq_len =450, after prepreding soft prompts with 20 tokens per task from 4 previous tasks, the embedding size is seq_len = 450 + 20 * 4 = 530 > 512, 512 is the maximum size of embedding for bert-base-uncased model. Would you like to share the len of the input sequence used to reproduce table1.b? I haven't found it in the main paper. Thanks.
Hi, after troubleshooting the problem, I found that this issue stems from the embedding size of bert-base-uncased model. In particular, if we set the
seq_len =450, after prepreding soft prompts with 20 tokens per task from 4 previous tasks, the embedding size isseq_len = 450 + 20 * 4 = 530 > 512, 512 is the maximum size of embedding forbert-base-uncasedmodel. Would you like to share the len of the input sequence used to reproduce table1.b? I haven't found it in the main paper. Thanks.
I also met the same problem, BERT model cannot handle with the hyperparamters given in this repository: seq_len=450 with the prompt length=20. I'm curious what hyperparameters did the authors actually use to get the result in the paper.