ProgressivePrompts icon indicating copy to clipboard operation
ProgressivePrompts copied to clipboard

trouble with bert-based model

Open CuongNN218 opened this issue 2 years ago • 2 comments

Dear Dr.Arazd @arazd , Thanks for your great work. I'm trying to replicate your result in Table.1 with the order 4 (5 tasks - bert base-uncased model) - CL setting (full data) in the main paper with the following cmd: python train_cl2.py --task_list ag yelp_review_full amazon yahoo dbpedia --prefix_MLP residual_MLP2 --lr 1e-4 --num_epochs 40 --freeze_weights 1 --freeze_except word_embeddings \ --prompt_tuning 1 --prefix_len 20 --seq_len 450 --one_head 0 \ --model_name bert-base-uncased --early_stopping 1 \ --save_name BERT_order_4_run1 --save_dir ./results

However, when the progressive prompt model evaluate the accuracy on all dataset, it thrown an error like this when it started evaluating on yahoo dataset: /opt/conda/conda-bld/pytorch_1639180487213/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [60,0,0], thread: [96,0,0] Assertion srcIndex < srcSelectDimSize failed.

This issue only appears in the evaluation of the 4th task, I tried other settings such as shorter sequence (2, 3 tasks), removing ResMLP, ... but it works as normal. I also tried to print the pattern of the input ids, token_type_ids, and position_ids but the 4th task's pattern is similar to previous tasks.

the error stems from this line in your repo: https://github.com/arazd/ProgressivePrompts/blob/01572d6a73c0576b070ceee00dbe4f5bc278423f/BERT_codebase/model_utils.py#L576

Could you give me some insight about this problem? I really appreciate if you can help me to fix that.

P/s: After troubleshooting the source of those problem, I found that it only occur when the sequence length longer than 4, and evaluating with full validation set.

CuongNN218 avatar Jul 06 '23 06:07 CuongNN218

Hi, after troubleshooting the problem, I found that this issue stems from the embedding size of bert-base-uncased model. In particular, if we set the seq_len =450, after prepreding soft prompts with 20 tokens per task from 4 previous tasks, the embedding size is seq_len = 450 + 20 * 4 = 530 > 512, 512 is the maximum size of embedding for bert-base-uncased model. Would you like to share the len of the input sequence used to reproduce table1.b? I haven't found it in the main paper. Thanks.

CuongNN218 avatar Jul 06 '23 15:07 CuongNN218

Hi, after troubleshooting the problem, I found that this issue stems from the embedding size of bert-base-uncased model. In particular, if we set the seq_len =450, after prepreding soft prompts with 20 tokens per task from 4 previous tasks, the embedding size is seq_len = 450 + 20 * 4 = 530 > 512, 512 is the maximum size of embedding for bert-base-uncased model. Would you like to share the len of the input sequence used to reproduce table1.b? I haven't found it in the main paper. Thanks.

I also met the same problem, BERT model cannot handle with the hyperparamters given in this repository: seq_len=450 with the prompt length=20. I'm curious what hyperparameters did the authors actually use to get the result in the paper.

mingyang-wang26 avatar Oct 13 '23 14:10 mingyang-wang26