xtensor
xtensor copied to clipboard
row major xtensor assignment is so slow
using tensor_type_row = xt::xtensor<float, 4, xt::layout_type::row_major>;
using tensor_type_col = xt::xtensor<float, 4, xt::layout_type::column_major>;
auto tmp_a = tensor_type_col({200, 512, 512, 5});
tensor_type_col tmp_b;
tmp_b = xt::view(tmp_a, xt::all(), xt::all(), xt::all(), xt::range(1, 2));
this demo code cost about 400ms if i use row_major, 100ms of column major
how can i speed up this ?
its even slow than using for loop with openmp enabled.
I'm pretty sure this is slow because the column major xtensor will be strided in memory between elements. This will always be slow. I haven't run the assignment loop in this case but it could very well be using the stepper assignment method which won't be improved with openmp.
Could we quantify the performance by writing an explicit assignment in a loop (making use of the strides)?