Mateusz P. Nowak comments

Results 9 comments of


                                            Mateusz P. Nowak

Sort performance optimisation with SYCL

Optimization failed, the critical loop (in sort.hpp, splitters() function) seems impossible to parallelize for effective GPU execution

Multi-node tests on Borealis

Suspended, passed to @lslusarczyk The status: - qsub + mpirun working, running the multinode benchmarks (single node temporary disabled in the branch) - plotter generating only part of figures -...

analyse intel_transport_recv.h at line 1160: cma_read_nbytes == size assert

Problem with assert in intel_transport_send.h at line 2012 is solved in IMPI 2021.11 (tested on devcloud, with IMPI 2021.11 installed in home dir)

analyse intel_transport_recv.h at line 1160: cma_read_nbytes == size assert

I_MPI_OFFLOAD=0 mpirun -n 2 ./build/benchmarks/gbench/mhp/mhp-bench --sycl --benchmark_filter=Sort_DR -> Assertion failed in file ../../src/mpid/ch4/shm/posix/eager/include/intel_transport_recv.h at line 1175: cma_read_nbytes == size However, with I_MPI_OFFLOAD=1 (which should be used with IMPI on GPU)...

subrange

Tests added in #300

implement distributed_matrix and 2d stencil in mhp

- implementation of distributed_vector (#103 ) must be finished - then test mhp::transform() on the above vector - actually implement stencil-2d analogous to stencil-1d

implement distributed_matrix and 2d stencil in mhp

With Robert' support: distributed_vector done in this task, and stencil-1d-array example done in this taks. New tasks created to cover implementation of real distributed_dense_matrix

[UR][L0 v2] Implement zero-copy buffers for integrated gpus

@intel/llvm-reviewers-runtime this is friendly invitation to review

KernelAndProgram/free_function_kernels.cpp fail on Intel GPU in pulldown PR

Fails also for ARL integrated gpu, disabled in #20890