Results 2 issues of 一路走北

## Description onnx: https://drive.google.com/file/d/1JgwgwIl71BnJRw2e9FtgV0DGSGzLy0OZ/view?usp=sharing I tried to use fp32 for MatMul in self-attention and cross-attention layer on A100, but it seems not work. - I inserted Cast layer around MatMul...

**Your open source work is very nice.Could you share the hyperparat reproameters thduce OLMo-2-0425-1B instruct in the document? I encountered the problem of KPI during the reproduction process, especially math....