Qwen2.5-Math
Qwen2.5-Math copied to clipboard
A series of math-specific large language models of our Qwen2 series.
Math demo https://qwen2.org/math : 如何从 11, 2, 23,8从简单的加减乘除运算得到 24,每个数都只用一次 ``` To obtain 24 using the numbers 11, 2, 23, and 8 with basic arithmetic operations (addition, subtraction, multiplication, and division),...
Hi, When running the GSM8K evaluation experiments using an 8-shot setting, I noticed that the few-shot examples were not successfully applied. Specifically, the current implementation triggers (in [here](https://github.com/QwenLM/Qwen2.5-Math/blob/a45202bd16f1ec06f433442dc1152d0074773465/evaluation/utils.py#L204C5-L214C59)). ```python if...
I can't locate the core logic for cleaning the model's generated solution from non-python code in the codes within evaluation directory. Currently, i can only see how dataset ground truth...
用Transformers库调用模型,max_new_tokens设置为20000,当生成长度为4096时会出现警告:This is a friendly reminder - the current text generation call will exceed the model's predefined maximum length (4096). Depending on the model, you may observe exceptions, performance degradation, or...
我参考TIR的prompt在qwen2.5-Math的1.5B和7B模型上进行了实验,得到的指标结果比COT差,我怀疑我的实现缺少了一些步骤,能说明下更详细的实现方式嘛? 我参考下面的prompt实现了TIR ``` # TIR messages = [ {"role": "system", "content": "Please integrate natural language reasoning with programs to solve the problem above, and put your final answer within \\boxed{}."},...
When I use the Qwen2.5-Math-7B model for inference, I get the following information: This is a friendly reminder - the current text generation call will exceed the model's predefined maximum...
下面这块代码,我理解是,对于每个问题只取n个sample的第0个的分数的均值作为acc。那么n_sampling>1就没意义了。 [evaluate.py#line78](https://github.com/QwenLM/Qwen2.5-Math/blob/a45202bd16f1ec06f433442dc1152d0074773465/evaluation/evaluate.py#L78) ```python score_mat = [] for sample in samples: sample['score'] = scores[idx: idx+len(sample['pred'])] assert len(sample['score']) == len(sample['pred']) score_mat.append(sample['score']) idx += len(sample['pred']) max_len = max([len(s) for s in score_mat]) for...
Hi authors I’m currently a research scientist at NVIDIA working on mathematical reasoning. I came across your repository, and I really appreciate the work you’ve done! We’re also working on...
您好,请问 base model 的评测是有专门的 prompt 吗?直接用对 instruct 模型的评测代码测试Qwen2.5-Math-1.5B,结果与 report 结果差距有点大。
I am interested in understanding the composition of the pre-training dataset used for Qwen-2.5-math. Specifically, I would like to know: 1. What are the primary sources or types of datasets...