LTP
LTP copied to clipboard
[KDD'22] Learned Token Pruning for Transformers
Hi, I found I cannot run the learned token pruning whether I install the transformers or not. If I install transformer, it will raise the error of `ValueError: Some specified...
when I run `python run.py --arch ltp-base --task SST2 --restore pretrained/bert-base-uncased-SST-2 --lr 2e-5 --temperature 2e-3 --lambda_threshold 0.1 --weight_decay 0 --bs 64 --masking_mode soft --epoch 10 --save_step 100 --no_load` Some specified...
Hey, there. After I read the code, I am confused that the computation cost can be reduced by mask more tokens. Did I miss anything? PS. I see the FLOPS...
When hard pruning inference, tokens below the threshold will be discarded and do not enter the calculation of the feed-forward layer, but when entering the feed-forward layer after normalization and...
Can you show that how you evaluate the model performance with 'attention_mask' ? according to this line: https://github.com/kssteven418/LTP/blob/8ab31a623fb71c5f4f8208e878097f214484e848/src/transformers/models/ltp/modeling_ltp.py#L305C27-L305C27 the 'attention_mask' is never used outside the for loop. So, I think...
https://github.com/kssteven418/LTP/blob/f1d5ec88aba913de5e2b4aa502af9cf0ab7bb13f/src/transformers/models/ltp/modeling_ltp.py#L247 if self.training and not self.hard_masking: if pruner_outputs is not None: threshold, pruning_scores = pruner_outputs['threshold'], pruner_outputs['scores'] self.mask = torch.sigmoid((pruning_scores - threshold) / self.temperature) layer_output = layer_output * self.mask.unsqueeze(-1)
I am trying to train a ltp model tackling long document, but where can I get the pretrained model with max-seq-length over 512? As far as I know, pretrained models...
# 🖥 Benchmarking `transformers` Hi there, When I run one of the examples in the text classification folder, and pass max_seq_length =1024 to the model, I got the following warning,...
FLOPs
Since it is a dynamic transformer, the GFLOPs of each instance input is different. How to calculate the FLOPs of the entire model? Take the average FLOPs of all validation...
it seems that the model will tend to make the token number larger when fix threshold (hard training step) because it cannot take L1 loss into account. How to solve...