YY Lin
YY Lin
NLP text processing includes tokenization,vocabulary generation,string2ids encoding,ids2string decoding. For now, I have to convert string to ids manually before I send the data to tfx pipeline. As you may expect,...
### Describe the feature ChatGLM-6B is a new Chinese chatGPT alternative. The weights are open sourced and we need to support tuning the model under Chat application. I've already implemented...
### Describe the feature The PPO training needs to maintain 4 models in memory at the same time. The original implementation keep the reward/actor critic/initial model in video ram at...
### Describe the feature Currently FP16 support can only make it possible for training models smaller than 2B in one graphic card with 24gb ram. However the main stream useful...
## 📌 Checklist before creating the PR - [ x] I have created an issue for this PR for traceability - [ x] The title follows the standard format: `[doc/gemini/tensor/...]:...
### 🐛 Describe the bug The setup.py in main branch just excludes op_builders. ``` setup(name=package_name, version=version, packages=find_packages(exclude=( 'op_builder', 'benchmark', 'docker', 'tests', 'docs', 'examples', 'tests', 'scripts', 'requirements', '*.egg-info', )), ``` I'm...
这个请求的原因,是在训练RLHF最后一步的PPO时,训练进程需要同时保存四个模型,Actor/Initial Model/Critic/Reward Model。其中Actor和Initial Model是ChatGLM-6B经过SFT调优后的模型,Critic/Reward Model是第二步训练出来的打分价值模型。 第三步PPO的时候,Actor对新Prompts生成新的Ids,Initial Model对生产的序列生成log probs,这些新Ids作为对当前State做出的Actions被Critic和Reward计算value和reward,形成一条新的Experience,多条Experience组装成Mini Batch,用于更新Actor和Critic参数,让Actor生成的结果尽量打高分同时不过多偏离原有的Initial Model。 这个过程里面,由于Actor生成的Ids序列要传入Reward里面计算,因此Ids必须有同样的含义。在一个系列模型里面,一般都会发布大小两个参数版本,比如Bloom 7B和Bloom 560M,两个版本共享tokenizer,这样打模型生成的Ids传入小模型就可以保有原来含义进行计算。 但ChatGLM-6B没有这样的小模型兄弟,导致我用ChatGLM-6B来训练价值模型。这样在PPO阶段,进程里面我同时维护了4个ChatGLM-6B大小的模型。为了能让训练继续,不得不做切换,同一时间仅保留一个模型(这是我在ColossalAI上做的PR:(https://github.com/hpcaitech/ColossalAI/pull/3567) )。虽然能跑起来,但是速度很慢。 希望能发布一个ChatGLM最小版,因为价值网络训练可以迅速收敛,我认为低于500M都可以,因为Bloom-560M版本,配合Bloom7B的训练,我得到的结果都很好。
https://github.com/BlinkDL/ChatRWKV/issues/160 This is an implementation of the "stop words". The file DEMO_FOR_STOPWORDS.py is a demo to show how to use this function.
Currently, the PIPELINE class in src/util.py has a arg "stop_token" which means the special designed single token_id to stop generation. But in most cases, the stop_token should be a token...
**Is your feature request related to a problem? Please describe.** Currently only Nvidia and AMD GPUs are monitored by btop. However apple sillicon version can only monitor CPUs. The tool...