ansar-sa
ansar-sa
I unfortunately do not have enough domain expertise for package building re conan. looks reasonably involved so will need developers with prior conan packaging experience or could just be one...
> It could be interesting to make a 2D dot function that works faster on small matrices / matrix-vector combination. We could even use Eigen as a "backend" for that...
Using xtensor class does not help either. As you say, the call into BLAS and broadcast checks is probably dominating the time with small matrices. Do you have some example...
Actually the problem is in other kinds of operations as well. Even simple array ops such as element wise array sums for xtensor based code is very slow compared to...
I'll update benchmark results using `xtensor`
``` ------------------------------------------------------------------------- Benchmark Time CPU Iterations ------------------------------------------------------------------------- eigen_array_sum/2 1.29 ns 1.28 ns 457478259 eigen_array_sum/4 3.31 ns 3.29 ns 208422955 eigen_array_sum/8 6.83 ns 6.80 ns 96818924 eigen_array_sum/16 32.8 ns 32.6 ns...
Could we please request / create issue in distributed upstream to integrate this work. For my use case it makes more sense to have a few very large machines than...