Wei Han (Henry) issues

Results 7 issues of


                                            Wei Han (Henry)

Is the gradient for Encoder doubled?

I notice there are two backpropagations for the generator and encoder. https://github.com/wiseodd/controlled-text-generation/blob/master/train_discriminator.py#L120-L122 https://github.com/wiseodd/controlled-text-generation/blob/master/train_discriminator.py#L130-L132 After the back-propagation of loss G, it runs zero_grad to clear all the grads of the generator...

UnpicklingError when loading OFA-base pretrained model in transformers

I met a problem when loading OFA-base model using hugggingface, the code snippet is below my torch version is 1.13.1 ``` from PIL import Image from torchvision import transforms from...

'File exists: "/00000_locals"' when integrated with deepspeed training scripts

## Environment - OS: [Ubuntu 22.04.2 LTS] - Hardware (GPU, or instance type): [A800] ## To reproduce Steps to reproduce the behavior: 1. pip install deepspeed 2. deepspeed train.py ......

bug

LongBench evaluation

Hi Authors, Thanks for the great work! I tried to evaluate lognlora on LongBench (https://github.com/THUDM/LongBench) using the checkpoint of LongAlpaca-7B (https://huggingface.co/Yukang/LongAlpaca-7B). I load the model directly in LongBench evaluation benchmark...

[Bug]: DP with sampling hangs after completing generation

### Your current environment The output of `python collect_env.py` ```text INFO 04-25 14:52:24 [__init__.py:239] Automatically detected platform cuda. Collecting environment information... PyTorch version: 2.6.0+cu124 Is debug build: False CUDA used...

bug

Paper Revision: 2025.findings-naacl.150

### Anthology ID 2025.findings-naacl.150 ### Type of Change Revision ### PDF of the Revision or Erratum [NAACL_25_PREMISE.pdf](https://github.com/user-attachments/files/19997222/NAACL_25_PREMISE.pdf) ### Brief Description of Changes We found in pervious version, the author list...

correction

pending

add snapKV implementation for transformers sdpa attention with flash_attn availability checking

In the case that flash_attn_2 is not available. Currently only add hijiack_llama, will add implementations for other models in a later time.