Zhang JiaXin

Results 5 issues of Zhang JiaXin

wget -nc https://s3-us-west-1.amazonaws.com/fasttext-vectors/wiki.en.vec it shows 403 forbidden when I run it.

How are the training and test sets of "zhihu" and "composition" datasets divided?

deepspeed: 0.9.2 transformers: 4.30.0.dev0 torch: 1.12.1+cuda11.6 server info: 8*A100 80G Memory: 500G When load a model that has 15B parameters on single node(8*A100),the memory usage exceeds 500 GB and the...

training

Here is the error I met, seems like the `self._total_batch_size` is `None`, but I don't know the reason ``` File "/path/model_training/DeepSpeed-Chat/training/step3_rlhf_finetuning/main.py", line 434, in main out = trainer.generate_experience(batch_prompt['prompt'], File "/path/model_training/DeepSpeed-Chat/training/step3_rlhf_finetuning/ppo_trainer.py",...

deespeed chat
hybrid engine

When I load the 33B model by the method shown in below, it's too slow to generate a token. And per token is about 2.9s ```python tokenizer = AutoTokenizer.from_pretrained(checkpoint) tokenizer.pad_token...