Marcin Krotkiewski
Marcin Krotkiewski
On top of that there are performance issues (for example, `osu_bcast`): ``` ----------------------------------------------- -- osu_bcast, HCOLL # OSU MPI Broadcast Latency Test v5.7 # Size Avg Latency(us) Min Latency(us) Max...
@vspetrov Thanks for the answer! case 4 was indeed reproduced in 2.9, but 2.10 doesn't seem to have it and tests run smoothly. So at least that one is clear!...
@vspetrov Hi! I think I have finally found the issue and the cause of the Bus errors we see on our system. I have investigated the memory allocation on the...
@hjelmn not sure if this is important, but I've noticed that I get the deadlock less often when I don't call `xpmem_remove` explicitly in my code. This makes me wonder:...
@cvmeq Unfortunately no, I still see those issues sometimes, mostly when you kill / interrupt a large job, or at job cleanup. Then only solution for me was not to...
Just to add some info, I found that the segfault happens when the file is located on the Lustre file system. The same code works fine if the file is...
@jsquyres Perfect! thanks, `-mca io ompio` works :) I'm not up to date here. Is there a substantial differece between that and `romio321`?
@jsquyres Thanks a lot, that's good to know! I run 4.0.3, which seems to use `romio321` by default. At least on our system (maybe because of Lustre?) ``` [login-2.betzy.sigma2.no:05806] io:base:file_select:...
@jsquyres and some more info: I checked OpenMPI 3.1.4 with `romio314`, and that works. So it seems it is something with the newer version..
@mbianco I am currently using manually implemented bindings, so no rush. It is more like a useful thing to have, IMO.