Clive Chan
Clive Chan
These can be important for performance, depending on the application. I think the most intuitive way to add this would be to add some options to prepared calls, but I...
Takes a full copy of grad off the peak memory usage. Numbers based on `torch.cuda.max_memory_allocated()`: - For `gpt-nano`: `32019456` to `31666688` - For `gpt2-xl`: `30634800640` to `24607903232` (6 gigabytes!)
FTZ is not currently configurable (it is always turned on); this adds TRITON_FTZ as an environment variable.