Alexander Samoilov

Results 9 comments of Alexander Samoilov

hello, IMHO the problem with the example is that in the sequence `1,-2.1,3,4.5` the values are of different types: the 1st one is an integer, the second one is a...

it is ok with this: ```diff git diff . diff --git a/examples/other/vectorAdd_profiled.cu b/examples/other/vectorAdd_profiled.cu index a31c937..22408a1 100644 --- a/examples/other/vectorAdd_profiled.cu +++ b/examples/other/vectorAdd_profiled.cu @@ -25,8 +25,23 @@ __global__ void vectorAdd(const float *A, const...

Great! Thank you @eyalroz , it would be nice to enrich example to get some profile stats, e.g. ipc, memory bandwidth as the example belongs to BLAS level I, it...

Hi @eyalroz , just checked from `HEAD` ```sh [alsam@Noire build_debug2]$ cmake .. -DCMAKE_BUILD_TYPE=Debug -DCAW_BUILD_EXAMPLES=ON ... cd exampes/bin $ ./vectorAdd_profiled terminate called after throwing an instance of 'cuda::runtime_error' what(): Starting CUDA...

@eyalroz maybe I missed something? From which branch should I get the code? See your changes in `HEAD` ```git git log commit 4aac489d89a60675bbe48a1d90a8817bd0039086 (HEAD -> master, tag: v0.5.4, origin/master, origin/HEAD)...

ok, it works! ```sh git checkout development ... ./vectorAdd_profiled CUDA kernel launch with 1954 blocks of 256 threads SUCCESS ``` Thanks!

Thank you for the great library @eyalroz ! Envisage a potential usage for e.g. collecting traces for different workloads, replaying them, collecting some of performance counters :-)

пн, 6 июл. 2020 г., 20:29 Bodhi : > Hi > > The MATLAB implementation of LOBPCG has the feature of functors for the A > and B matrix. Can...