Jianfeng Wang
Jianfeng Wang
@karpathy I notice that `loss` is NOT divided by `gradient_accumulation_steps` before backward, but should the loss be averaged across the whole batch size? https://github.com/karpathy/nanoGPT/blob/ae3a8d5fdd3ddb8b13fab182723476523961e3ab/train.py#L281-L293
原实现可能是为了符合xformers要求并减少transpose次数,没有对`value_states`做transpose,但是`self.rotary_emb`要求输入是`[bs, num_heads, seq_len, head_size]`,这会导致如下报错: ```cpp ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [634,0,0], thread: [0,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed. ``` 此issue中有人汇报过类似bug:https://github.com/baichuan-inc/baichuan-7B/issues/23#issuecomment-1592658897
## 🐛🐛 Bug Report ### ⚗️ Current Behavior When I build a dataset with a sequence of images (like frames), everything works fine. But if I tranfer it to a...
以下现象在所有子集中都存在,以`pCLUE_dev.json`为例: 1. 第9行的target是`辅助工具`,但answer_choices中没有该项; 2. 第70行的target是`电竞`,但answer_choices中没有该项; 3. 第150行的target是`医疗服务`,但answer_choices中没有该项; 4. 第171行的target是`休闲益智`,但answer_choices中没有该项; 5. …… 我简单统计了一下,在`pCLUE_dev.json`中,classify数据总共有44414条,其中5394条的target不在answer_choices中,占比约12%。 想问下这个问题官方能否修复一下?还是比较影响评测结果的。
version: 0.8.9, Python 3.6.9, reproduction code: ```python3 In [1]: from tabulate import tabulate ...: data = dict(a=0.1, b=True, c="c") ...: table = [(str(k), str(v)) for k, v in data.items()] In...
With both `flash_attn_varlen_qkvpacked_func` and `CheckpointImpl.NO_REENTRANT` raise Runtime Error below: ```python Traceback (most recent call last): > File "/opt/tiger/antelope/train.py", line 718, in main() └ File "/opt/tiger/antelope/train.py", line 703, in main train(...