He Cao comments

Results 26 comments of


                                            He Cao

How to understand the positional embedding part in rvt(rotary embedding)?

> @CiaoHe hello! yea sure, so the frequencies in the original paper were designed for language modeling, but I ended up using the frequencies as defined in the Perceiver paper...

loss nan after several epochs

same issue here when utilizing bi-mamba implementation refer to https://github.com/hustvl/Vim/blob/main/mamba-1p1p1/mamba_ssm/ops/selective_scan_interface.py. two ways can solve this: 1. fp32 training; 2. decrease lr. But both are not ideal solutions 1. fp32 train:...

[Bug]: starlette.websockets.WebSocketDisconnect: 1006

same issue, when running img2img

`encoding` parameter in `logging.basicConfig` breaks Python 3.8

Just delete `encoding=xxxx` works

accroding to data_cleaning.md, how to get sharegpt_20230322_html.json firstly?

plz refer to https://github.com/lm-sys/FastChat/issues/90#issuecomment-1493317309

Missing import from train/train.py for LORA training

You can refer https://github.com/GanjinZero/RRHF/blob/529196c00656322ce861fd8262a2c452b401780f/train.py#L93 to manually add this function

How long does it take for the llama team to respond to the weight request form?

Directly download from https://huggingface.co/decapoda-research/llama-7b-hf.

out of gpu memory using 4xA100 40G

yeah, same situation. Even downsize the ``` --per_device_train_batch_size 1 # original 2 ``` still OOM Maybe some heroes can solve this using deepspeed?

out of gpu memory using 4xA100 40G

> we have tried to train the 7b model on A100 40G * 8, with default settings. And all GPU memories are almost eaten up. If set batchsize to 1,...

accroding to data_cleaning.md, after perform split_long_conversation.py, what do we do next?

put the xxx_clean_split.json into the `--data_path` in the finetune script, like: ```bash torchrun --nproc_per_node=4 --master_port=20001 fastchat/train/train_mem.py \ --model_name_or_path YOUR_LAMA_PATH \ --data_path xxxx_clean_split.json \ --bf16 True \ --output_dir output \ --num_train_epochs...