Wei Han (Henry)

Results 7 issues of Wei Han (Henry)

I notice there are two backpropagations for the generator and encoder. https://github.com/wiseodd/controlled-text-generation/blob/master/train_discriminator.py#L120-L122 https://github.com/wiseodd/controlled-text-generation/blob/master/train_discriminator.py#L130-L132 After the back-propagation of loss G, it runs zero_grad to clear all the grads of the generator...

I met a problem when loading OFA-base model using hugggingface, the code snippet is below my torch version is 1.13.1 ``` from PIL import Image from torchvision import transforms from...

## Environment - OS: [Ubuntu 22.04.2 LTS] - Hardware (GPU, or instance type): [A800] ## To reproduce Steps to reproduce the behavior: 1. pip install deepspeed 2. deepspeed train.py ......

bug

Hi Authors, Thanks for the great work! I tried to evaluate lognlora on LongBench (https://github.com/THUDM/LongBench) using the checkpoint of LongAlpaca-7B (https://huggingface.co/Yukang/LongAlpaca-7B). I load the model directly in LongBench evaluation benchmark...

### Your current environment The output of `python collect_env.py` ```text INFO 04-25 14:52:24 [__init__.py:239] Automatically detected platform cuda. Collecting environment information... PyTorch version: 2.6.0+cu124 Is debug build: False CUDA used...

bug

### Anthology ID 2025.findings-naacl.150 ### Type of Change Revision ### PDF of the Revision or Erratum [NAACL_25_PREMISE.pdf](https://github.com/user-attachments/files/19997222/NAACL_25_PREMISE.pdf) ### Brief Description of Changes We found in pervious version, the author list...

correction
pending

In the case that flash_attn_2 is not available. Currently only add hijiack_llama, will add implementations for other models in a later time.