He Cao

Results 26 comments of He Cao

> @CiaoHe hello! yea sure, so the frequencies in the original paper were designed for language modeling, but I ended up using the frequencies as defined in the Perceiver paper...

same issue here when utilizing bi-mamba implementation refer to https://github.com/hustvl/Vim/blob/main/mamba-1p1p1/mamba_ssm/ops/selective_scan_interface.py. two ways can solve this: 1. fp32 training; 2. decrease lr. But both are not ideal solutions 1. fp32 train:...

plz refer to https://github.com/lm-sys/FastChat/issues/90#issuecomment-1493317309

You can refer https://github.com/GanjinZero/RRHF/blob/529196c00656322ce861fd8262a2c452b401780f/train.py#L93 to manually add this function

Directly download from https://huggingface.co/decapoda-research/llama-7b-hf.

yeah, same situation. Even downsize the ``` --per_device_train_batch_size 1 # original 2 ``` still OOM Maybe some heroes can solve this using deepspeed?

> we have tried to train the 7b model on A100 40G * 8, with default settings. And all GPU memories are almost eaten up. If set batchsize to 1,...

put the xxx_clean_split.json into the `--data_path` in the finetune script, like: ```bash torchrun --nproc_per_node=4 --master_port=20001 fastchat/train/train_mem.py \ --model_name_or_path YOUR_LAMA_PATH \ --data_path xxxx_clean_split.json \ --bf16 True \ --output_dir output \ --num_train_epochs...