LongAlign issues

Update README.md

Add OpenXLab download links

[BUG] 使用Langchain-Chatchat框架加载chatglm3-6-b-128k出现自问自答，停不下来的情况

1

**问题描述 / Problem Description** 使用chatglm-6-b-128k出现自问自答，停不下来的情况 ![微信图片_20240313110149](https://github.com/chatchat-space/Langchain-Chatchat/assets/50111234/67c9ef0e-d288-43f2-b5ed-a62e2bc288a5) **复现问题的步骤 / Steps to Reproduce** 1.使用chatglm3-6b-128k 2.无论问答什么内容，都会出现自问自答情况。 **预期的结果 / Expected Result** 回答完毕后停止。 **实际结果 / Actual Result** 回答当前问题后，不停自问自答，无法停止。 **环境信息 / Environment Information** - langchain-ChatGLM 版本/commit...

nmbtwzt

关于Packing和直接Batch的loss区别？

5

论文中指出Packing Loss和直接Batch Loss不一致，是基于这个公式：即：以样本为粒度，算loss 先在样本内平均，再batch内平均，两步走。基于我的认知，SFT训练中一般是以Token为粒度算最终的loss的，即 "target token loss 总和 / target token 总数"，并非样本粒度。我看了下你的代码实现，即modeling_llama.py文件中按直接Batch算，**loss是从 batch*seq 直接Flat成一个seq，还是直接以token为粒度计算的loss**，并非样本粒度(即先在seq 求平均，再在batch求平均) 有两个问题讨论: 1. SFT中loss 最后一步的平均，究竟应该以Token为粒度还是以样本为粒度？ 2. 如果以Token为粒度，我认为Packing和非Packing是等价的

BitVoyage

question

Needel_test CUDA OOM 了应该怎么解决？

1

token 太多OOM了应该怎么解决？

SefaZeng

微调训练问题

1

1、作为微调来说，是用基础模型重新训练，还是使用微调的方式。 2、如果需要重新训练一个7b模型需要多少显存，说明中没有找到相关硬件需求表格。 3、对于长文本输入的情况下，更加适用于那种方式。

mhzn-yn

packing loss 的归一化问题

1

[这里的loss计算](https://github.com/THUDM/LongAlign/blob/main/modeling_chatglm.py#L900)是不是应该归一化一下 `loss = (loss * shift_weights).sum()` -> `loss = (loss * shift_weights).sum() / shift_weights.sum()` 把loss归一化到token粒度前一种方式，loss的scale偏大，而且反向传播梯度也会偏大。而且极限情况下，假设每个样本只有1个token，这个batch的loss会爆炸

Chandler-Bing

请问训练时有没有开启 full recompute？

wplf

重复输出the the the the

代码及运行结果如下： ![image](https://github.com/user-attachments/assets/ee1a8366-70cc-4bd2-a84b-f6faacbb98c0) 模型是从：https://huggingface.co/THUDM/LongAlign-7B-64k-base/tree/main 下载 chat函数参考：https://github.com/THUDM/LongAlign/issues/4#issuecomment-1985307045 输出结果重复the the the... 更换过不同的temperature，结果一样请问有人遇到过类似的问题吗？

cuichenxu

代码是这么写的？预先定义好句号的token id是什么意思，只能测试GLM呗？？？？？

https://github.com/THUDM/LongAlign/blob/9ae0b597737c6658f4350ef7a42d5d01980d142c/Needle_test/prompt.py#L214 搞了半天测试，原来needle一直在context开头，无语了 ![image](https://github.com/user-attachments/assets/09698f87-eb96-4be2-aec1-c681eb07a6cd) 搞不懂，既然你们只想要30930作为句号的tokenizer，你们开源repo的意义是什么？？？？？？？？？？？？？？？？？？？？？？？？？？？？？？？？？？？？？？？？？？？？？？？？？？？？

kkwhale7

LongAlign
LongAlign copied to clipboard

Metadata

Update README.md

[BUG] 使用Langchain-Chatchat框架加载chatglm3-6-b-128k出现自问自答，停不下来的情况

关于Packing和直接Batch的loss区别？

Needel_test CUDA OOM 了应该怎么解决？

微调训练问题

packing loss 的归一化问题

请问训练时有没有开启 full recompute？

重复输出the the the the

代码是这么写的？预先定义好句号的token id是什么意思，只能测试GLM呗？？？？？

← Metadata

Owner

Metadata

LongAlign LongAlign copied to clipboard

Metadata

← Metadata

Owner

Metadata

LongAlign
LongAlign copied to clipboard