romix comments

Results 57 comments of


                                            romix

Maintenance

Hi, I have not touched it for a while :-) On the other hand, there were no user requests for improving anything ... If you'd like to see some improvements,...

Status of libdnn as of April 2018

Thanks a lot for such a quick response! > Yes, it's planned as soon as I'm done with my work on the Caffe branch, which includes finishing Quantization for LibDNN....

Question about performance

> Have you also checked it's actually using the AMD GPU on OpenCL Yes, I checked. It is using AMD Radeon Pro 555 Compute_Engine. > Have you tried different batch...

Question about performance

> How do you allocate the memory for filter, input and output? All the buffers are allocated in advance on the device. There is no dynamic memory allocation happening during...

Question about performance

I do test with the whole resnet50 and I run multiple iterations of it to make sure that the first run involving e.g. kernel compilations is not influencing the picture...

Question about performance

@naibaf7 I tried to build https://github.com/BVLC/caffe/tree/opencl locally. Now I'm trying to figure out how to run the resnet50 with it. I have some existing python scripts doing it, but they...

Question about performance

@naibaf7 Thanks! I downloaded these models. And I'm able to run them. BTW, if I run on the OpenCL CPU backend, libDNN asserts: ``` ViennaCL: FATAL ERROR: Kernel start failed...

Question about performance

OK. Here are the numbers (batch size 16, average forward pass, 10 iterations): * AMD Radeon Pro 555 Caffe OpenCL + CLBlast + LibDNN: 2655.84 * Caffe CPU: 1960.38 ms...

Question about performance

> Also thanks about the heads-up of the Apple OpenCL implementation. If they indeed only allow (1024, 1, 1) and no symmetric ranges on first and second dimension, then LibDNN...

Question about performance

> if that's the case, then choosing more iterations to average it out (i.e. 50 iterations) would show truer numbers: I tried with more iterations, but get more or less...