mangoyuan
mangoyuan
A new work is released in [Disentangled Non-Local Neural Networks](https://arxiv.org/abs/2006.06668).
Same error. I just tell what i meet. Every time i call `build_vocab`, the `word2idx` and `idx2word` will be different. So i save the vocab to `pickle` first time and...
ch04_basic.tex, line 258, "Wasserstein Distance是一套用来衡量两个概率分部..."笔误,改为"概率分布"。 最后,谢谢作者的贡献!
+1 waiting for dynamic weight convolution to accelerate inference time ^-^
Combine the content of `hypertile_xyz.py` and `hypertile_script.py` may fix it. Comment this line `# from scripts.hypertile_xyz import add_axis_options` in `hypertile_script.py` . The problem is caused by the program try to...
我总是在1023或者接近1024的时候卡了很久。  这是我的推理脚本,32b、72b和72b-awq都遇到,卡很久 ```bash env MIN_PIXELS=3136 MAX_PIXELS=1003520 swift infer \ --model Qwen/Qwen2_5-VL-32B-Instruct \ --val_dataset $DATA_JSONL \ --max_length 8192 \ --infer_backend vllm \ --ckpt_dir $CKPT_PATH \ --result_path xxx.jsonl \ --tensor-parallel-size 8...
如果多轮对话大于5,maxImage=5的话,是取最近的5张图片,不理init吗?
同样的问题。我测试了用hermes的user-assistant-tool的格式容易产生复读。换qwen模板,按照qwen2技术报告里面的方法组织貌似正常一点