Zhang Xianyi

Results 27 comments of Zhang Xianyi

@carlkl , I already merged @jeromerobert patch on develop branch. Could you know how many threads MKL used?

@wernsaar , it's a small matrix size of @carlkl 's test case. I think it need only use single thread instead of multithreading. Actually, it is an old OpenBLAS issue...

@hiccup7 , Could you test more dgemv MKL results with 1, 2, and 4 threads? Please refer to this article https://software.intel.com/en-us/node/528546 to control the number of MKL threads.

Hi all, I just ran the latest develop branch on our Haswell machine(Intel Core i7-4770 CPU, Ubuntu 14.04.1 64-bit). For `201x150`, ``` OPENBLAS_NUM_THREADS=1 ./test_gemv_open 201 150 1000000 201x150 1000000 loops...

@hiccup7 , what's `CPU_CORES` in your test codes? Is it 4 (the number of physical cores) or 8 (the number of logical cores)?

@hiccup7 , OpenBLAS only can choose one thread for some small input sizes. However, OpenBLAS cannot switch 2 , 4 or 8 threads dynamically based on the input size.

Improve the performance for `4x100000` case. When I uses two threads, it can achieve the best performance. ``` OPENBLAS_NUM_THREADS=1 ./test_gemv 4 100000 100000 4x100000 100000 loops 12.048461 s 6639.852177 MGFLOPS...

@hiccup7 , You can set them to 2 threads in your application. For OpenBLAS, I think we need to test more inputs and CPUs.

@hiccup7 , I applied Intel tools for open source contributor a week ago. However, I didn't get the response yet. :(

@culurciello , which kernel do you use? Now, OpenBLAS only supports ARM hard FP ABI. Is it possible an ABI issue?