Mikhail Brinskiy
Mikhail Brinskiy
@vanzod, can you please check whether setting UCX_POSIX_USE_PROC_LINK=n environment variable helps?
> @dmitrygx @brminich the following doesn't seem related: > > ``` > 2022-08-23T16:39:07.3474380Z [ RUN ] rcx/test_ucp_am_nbx_seg_size.multi/0 > 2022-08-23T16:39:07.6539640Z [ INFO ] seg size 1024 data size 2048 > 2022-08-23T16:39:07.6609112Z...
@hoopoepg, @Artemy-Mellanox, can you please review?
@karasevb, can you please take a look?
Can you please post the commands you used to run your application?
what is the command which produces the original errors from pml_ucx? Can you please also post the whole log? Also why do you specify psm2 provider for running with efa?
@nitbhat, can you please try UCX_IB_RX_MAX_BUFS=32768 without any other settings? if it does not help, smaller value is worth trying (say 8192)
@nitbhat, are you running on Frontera? can you please share the details on running the benchmark on 2 nodes? I'll try to reproduce locally, since I do not have an...
@nitbhat, thanks for the instructions. is it 100% reproducible on frontera? how long does it typically take to fail? I managed to run it on local system, with 28 threads...