john chen
john chen
parameter `num_hard_negatives`: top-k largest logits are kept to calc the softmax loss parameter `candidate_ids` to remove the same id when you use in-batch softmax
 嗯... pinSAGE啥时候能加进来呢😂
In python3: ```python feat_data[i, :] = list(map(float(), info[1:-1])) ```
I have the same problem.
https://github.com/QwenLM/Qwen/blob/main/README_CN.md 文档中提到已经支持 flash-attention 2,但为什么我还是遇到这个 issue 相同的问题? > 如果你的显卡支持fp16或bf16精度,我们还推荐安装[flash-attention](https://github.com/Dao-AILab/flash-attention)(当前已支持flash attention 2)来提高你的运行效率以及降低显存占用。(flash-attention只是可选项,不安装也可正常运行该项目)
> > @Doctor-L-end - thanks for contacting us with your feedback. Based on your issue submitted, I believe this translates to : " I personally feel that the packaging degree...
@Zor-X-L I also want to know if there is any new progress on this issue and how to achieve 3352 TOPS (FP4 with sparsity) on the RTX 5090? Thanks!
> Note that for externally hosted models, configs such as --device and --batch_size should not be used and do not function. i have the same problem. What can I do...