Vikram Sharma Mailthody comments

Results 51 comments of


                                            Vikram Sharma Mailthody

Is there a plan for a pytorch wrapper?

Is there an update on this issue?

Enable fp16 while running transformers with gpipe

Thanks for the response. Where do you suggest me to cast fp16 in layers.py? This is what I did - I casted the paddings to tf.float16 before (or after doesn't...

Kernel panic - after drop caches

The thing is 440.33 is not compatible with the 5.6.3. The minimum driver version needed is 440.82 for 5.6.3 kernel. Between 440.82 and 440.33 I don't see an obvious problem...

where is GPipe source code?

GPipe is tightly integrated to lingvo framework. If you look from https://github.com/tensorflow/lingvo/blob/master/lingvo/core/gpipe.py#L422 You will realize how the assignment of the layers is performed. It makes sense to design with a...

Issue in running first example

nope :(

CUDA support enhancement

Nope, and I don't want to. We need to leverage HBM bandwidth. Fundamentally this problem is bandwidth bound, in my opinion. Isnt it?

CUDA support enhancement

I have not heard anything about this. Have you guys spoken to the NVIDIA/RAFT Team (https://github.com/rapidsai/raft)? RAFT already has support for this. I can connect if you have not. @agourlay...

CUDA support enhancement

> so far our experiments show that the GPU does not provide any significant speedups for hnsw. @generall could you elaborate on this please? Few questions: 1. Is this data...

[CUDA ERROR] at file \llm.c\train_gpt2.cu:405: too many resources requested for launch (old version does not have this issue - fyi)

Change the following lines ``` __global__ void layernorm_forward_kernel3(float* __restrict__ out, float* __restrict__ mean, float* __restrict__ rstd, const float* __restrict__ inp, const float* __restrict__ weight, const float* __restrict__ bias, int N,...

[CUDA ERROR] at file \llm.c\train_gpt2.cu:405: too many resources requested for launch (old version does not have this issue - fyi)

This may not work on all cards. I need to determine the correctness of this fix for different thread block size before submitting a PR. I didn't a chance to...