rltakashige
rltakashige
Probably quite a bit easier in EXO 1.0 - just select it from the dropdown.
I've also seen this issue when running DeepSeek V3.1 in pipeline parallel (it does not happen in tensor parallel). Have not encountered this in Qwen Coder, but do want to...
https://github.com/exo-explore/exo/issues/879#issuecomment-3670942858
Seems to be more of an issue when running transformers. This is noted. Please reraise if this is an issue.
Thanks for a quick response. However, we've mainly been testing on MacOS 26.2, which shouldn't have this issue. Unless it's a regression?
Correct me if I misunderstood your reply - a model is composed of multiple transformer blocks (as well as some other modules we don't care too much about). In pipeline...
GPT OSS and GLM sharding support is around the corner, as well as a few more types of Qwen models. There is a transformers version incompatibility with Ministral3 models, which...
Noted! This is certainly a feature we will be looking to implement soon.
Thanks for the contribution! Looks like there's a lot of effort put into it. Although I haven't gone through the PR in detail yet, it seems like a good start...
Should no longer be an issue in 1.0 or future planned updates - but this happens as numpy does not support bfloat16 as a dtype.