Surya Dantuluri

Results 15 comments of Surya Dantuluri

@AdamDanielKing, This repo took a good chunk of nshepperd's codebase as @minimaxir has said in the past. This means this repo automatically does gradient checkpointing for anything that is not...

Looking into the code it seems @minimaxir used https://github.com/cybertronai/gradient-checkpointing for gradient checkpointing. I used the variations: - collection (which appears to be default) - speed (ran into the same OOM...

@saippuakauppias do you know how to use FP16? Not too familiar on how to start using it.

How does [nshepperd's fork](https://github.com/nshepperd/gpt-2/) deal with this? It seems like he puts gradient checkpointing at all layers: `if args.accumulate_gradients > 1:` not sure though.

Interesting. Going through `speed`, `memory`, and `collection` modes removing `if layer == 10`. Will update on once done (running on 16 vram V100)

Update: `speed` and `memory` options don't work. Just `collection` works (by default). All you need to do is delete the `if layer == 10` line` (I've tried if layer ==...

At this moment it's training, `accumulate_gradients = 1` and `batch_size = 1` as default. I'm think my input may be wrong because it's structured like this: ``` hello world more...

nvidia-docker has egl offscreen-rendering which makes https://sdan.io/3d (an implementation of this repo) possible let me know if you need help setting this repo w/docker (particularly with nvidia-docker)

I think there was another person who said they got it working on a 24GB GPU. The unfortunate part is gcloud only offers V100 at most and that's all I...