Clive Chan issues

Repositories
Issues
Comments

Results 3 issues of


                                            Clive Chan

[Feature request] Support for cuFuncSetCacheConfig, cuFuncSetAttribute, etc.

These can be important for performance, depending on the application. I think the most intuitive way to add this would be to add some options to prepared calls, but I...

Zero-grad more aggressively to save memory

Takes a full copy of grad off the peak memory usage. Numbers based on `torch.cuda.max_memory_allocated()`: - For `gpt-nano`: `32019456` to `31666688` - For `gpt2-xl`: `30634800640` to `24607903232` (6 gigabytes!)

Add the option to configure FTZ

FTZ is not currently configurable (it is always turned on); this adds TRITON_FTZ as an environment variable.