Jianfeng Wang issues

Results 6 issues of


                                            Jianfeng Wang

Should `loss` be divided by `gradient_accumulation_steps`?

@karpathy I notice that `loss` is NOT divided by `gradient_accumulation_steps` before backward, but should the loss be averaged across the whole batch size? https://github.com/karpathy/nanoGPT/blob/ae3a8d5fdd3ddb8b13fab182723476523961e3ab/train.py#L281-L293

Fix the dimensions of value_states when training with xformers

原实现可能是为了符合xformers要求并减少transpose次数，没有对`value_states`做transpose，但是`self.rotary_emb`要求输入是`[bs, num_heads, seq_len, head_size]`，这会导致如下报错： ```cpp ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [634,0,0], thread: [0,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed. ``` 此issue中有人汇报过类似bug：https://github.com/baichuan-inc/baichuan-7B/issues/23#issuecomment-1592658897

[BUG] dataloader only return the first element of the sequence

## 🐛🐛 Bug Report ### ⚗️ Current Behavior When I build a dataset with a sequence of images (like frames), everything works fine. But if I tranfer it to a...

bug

疑似classify类型数据有错误

以下现象在所有子集中都存在，以`pCLUE_dev.json`为例： 1. 第9行的target是`辅助工具`，但answer_choices中没有该项； 2. 第70行的target是`电竞`，但answer_choices中没有该项； 3. 第150行的target是`医疗服务`，但answer_choices中没有该项； 4. 第171行的target是`休闲益智`，但answer_choices中没有该项； 5. …… 我简单统计了一下，在`pCLUE_dev.json`中，classify数据总共有44414条，其中5394条的target不在answer_choices中，占比约12%。想问下这个问题官方能否修复一下？还是比较影响评测结果的。

ValueError: could not convert string to float: 'True'

version: 0.8.9, Python 3.6.9, reproduction code: ```python3 In [1]: from tabulate import tabulate ...: data = dict(a=0.1, b=True, c="c") ...: table = [(str(k), str(v)) for k, v in data.items()] In...

bug

good first issue

flash-attention v2 with activation checkpointing (no_reentrant) raise Runtime Error

With both `flash_attn_varlen_qkvpacked_func` and `CheckpointImpl.NO_REENTRANT` raise Runtime Error below: ```python Traceback (most recent call last): > File "/opt/tiger/antelope/train.py", line 718, in main() └ File "/opt/tiger/antelope/train.py", line 703, in main train(...