Vikram Sharma Mailthody
Vikram Sharma Mailthody
Is there an update on this issue?
Thanks for the response. Where do you suggest me to cast fp16 in layers.py? This is what I did - I casted the paddings to tf.float16 before (or after doesn't...
The thing is 440.33 is not compatible with the 5.6.3. The minimum driver version needed is 440.82 for 5.6.3 kernel. Between 440.82 and 440.33 I don't see an obvious problem...
GPipe is tightly integrated to lingvo framework. If you look from https://github.com/tensorflow/lingvo/blob/master/lingvo/core/gpipe.py#L422 You will realize how the assignment of the layers is performed. It makes sense to design with a...
nope :(
Nope, and I don't want to. We need to leverage HBM bandwidth. Fundamentally this problem is bandwidth bound, in my opinion. Isnt it?
I have not heard anything about this. Have you guys spoken to the NVIDIA/RAFT Team (https://github.com/rapidsai/raft)? RAFT already has support for this. I can connect if you have not. @agourlay...
> so far our experiments show that the GPU does not provide any significant speedups for hnsw. @generall could you elaborate on this please? Few questions: 1. Is this data...
Change the following lines ``` __global__ void layernorm_forward_kernel3(float* __restrict__ out, float* __restrict__ mean, float* __restrict__ rstd, const float* __restrict__ inp, const float* __restrict__ weight, const float* __restrict__ bias, int N,...
This may not work on all cards. I need to determine the correctness of this fix for different thread block size before submitting a PR. I didn't a chance to...