Scott Gray

Results 66 comments of Scott Gray

I would say that convolution is far from a solved problem. I still have a long list of optimizations I want to make. The biggest area to explore is how...

Maybe it wouldn't be quite so tricky. You'd just need to collect some running average of the on chip power stats during the execution of the epoch. Something like this...

And python bindings can be found here: https://pypi.python.org/pypi/nvidia-ml-py

But, it's worth pointing out that the boost clock is already tightly coupled with these real-time power and temperature measurements so the overall timings should be reflective of this. So...

For training with existing fp16 kernels you'll likely need a few tricks. To allow weight updates to proceed there needs to be enough overlap in mantissa and the weight for...

@andravin I agree that we need this. It's just hard to make it a priority over other things. But I think with pascal coming out, there will be a real...

One point about synthetic accuracy tests is that it doesn't necessarily correspond to final test accuracy. As I was saying earlier, low precision can sometimes produce better results. I'll quote...

In related news, I just finished the first winograd fprop/bprop fp32 kernel. It is fully fused and requires no additional memory. But the big news is that it runs fastest...

@ozabluda: Yes this is F(2x2,3x3). This requires a batch of 16 gemms. I'm able to fit this all in one block for K=32 and 4 overlapping coordinates of x,y each...

2.25(1-138/512)=1.64 was how I was calculating it. Basically any instruction in the gemm loop that isn't dual issued dilutes the number of FFMA's that can be processed. In this case...