romix
romix
Hi, I have not touched it for a while :-) On the other hand, there were no user requests for improving anything ... If you'd like to see some improvements,...
Thanks a lot for such a quick response! > Yes, it's planned as soon as I'm done with my work on the Caffe branch, which includes finishing Quantization for LibDNN....
> Have you also checked it's actually using the AMD GPU on OpenCL Yes, I checked. It is using AMD Radeon Pro 555 Compute_Engine. > Have you tried different batch...
> How do you allocate the memory for filter, input and output? All the buffers are allocated in advance on the device. There is no dynamic memory allocation happening during...
I do test with the whole resnet50 and I run multiple iterations of it to make sure that the first run involving e.g. kernel compilations is not influencing the picture...
@naibaf7 I tried to build https://github.com/BVLC/caffe/tree/opencl locally. Now I'm trying to figure out how to run the resnet50 with it. I have some existing python scripts doing it, but they...
@naibaf7 Thanks! I downloaded these models. And I'm able to run them. BTW, if I run on the OpenCL CPU backend, libDNN asserts: ``` ViennaCL: FATAL ERROR: Kernel start failed...
OK. Here are the numbers (batch size 16, average forward pass, 10 iterations): * AMD Radeon Pro 555 Caffe OpenCL + CLBlast + LibDNN: 2655.84 * Caffe CPU: 1960.38 ms...
> Also thanks about the heads-up of the Apple OpenCL implementation. If they indeed only allow (1024, 1, 1) and no symmetric ranges on first and second dimension, then LibDNN...
> if that's the case, then choosing more iterations to average it out (i.e. 50 iterations) would show truer numbers: I tried with more iterations, but get more or less...