Scott Gray comments

Results 66 comments of


                                            Scott Gray

DeepMark

I would say that convolution is far from a solved problem. I still have a long list of optimizations I want to make. The biggest area to explore is how...

DeepMark

Maybe it wouldn't be quite so tricky. You'd just need to collect some running average of the on chip power stats during the execution of the epoch. Something like this...

DeepMark

And python bindings can be found here: https://pypi.python.org/pypi/nvidia-ml-py

DeepMark

But, it's worth pointing out that the boost clock is already tightly coupled with these real-time power and temperature measurements so the overall timings should be reflective of this. So...

DeepMark

For training with existing fp16 kernels you'll likely need a few tricks. To allow weight updates to proceed there needs to be enough overlap in mantissa and the weight for...

DeepMark

@andravin I agree that we need this. It's just hard to make it a priority over other things. But I think with pascal coming out, there will be a real...

DeepMark

One point about synthetic accuracy tests is that it doesn't necessarily correspond to final test accuracy. As I was saying earlier, low precision can sometimes produce better results. I'll quote...

[October 2015] Intel are CPU magicians. But there's no one weird trick....

In related news, I just finished the first winograd fprop/bprop fp32 kernel. It is fully fused and requires no additional memory. But the big news is that it runs fastest...

[October 2015] Intel are CPU magicians. But there's no one weird trick....

@ozabluda: Yes this is F(2x2,3x3). This requires a batch of 16 gemms. I'm able to fit this all in one block for K=32 and 4 overlapping coordinates of x,y each...

[October 2015] Intel are CPU magicians. But there's no one weird trick....

2.25(1-138/512)=1.64 was how I was calculating it. Basically any instruction in the gemm loop that isn't dual issued dilutes the number of FFMA's that can be processed. In this case...