ngc92

Results 55 issues of ngc92

This PR adds support for the `DocStrip` format of TeX, used by many LaTeX packages to specify their installation. https://www.texlive.info/CTAN/macros/latex/base/docstrip.pdf I've inherited from the base `TeX` syntax, and just prepended...

I'm planning to do some more work on the (La)TeX syntax. I've opened this issue to make sure that general discussion does not get lost after a particular PR is...

RFC

### What happened? The following test case fails with the digit `1` not being scoped as part of the number. ```c int x = - //^ storage.type.c // ^ keyword.operator.assignment.c...

C: Syntax

Click to expand! ### Issue Type Feature Request ### Have you reproduced the bug with TF nightly? No ### Source binary ### Tensorflow Version 2.11 ### Custom Code Yes ###...

type:feature
type:support

an implementation of layer-norm that doesn't require shared memory or other intermediate buffers. On my A4000, I get a speedup from 192 GB/s for kernel two at block size 64...

Uses a single warp (instead of a block) per token, therefore relying entirely on wrap-level shuffle functionality provided by `cg::reduce`. Instead of achieving causal attention through masking, since we're looping...

This isn't intended to be merged as it is, but a demonstration of more speed-ups that can be achieved if you can make a few more assumptions, e.g., a more...

This doesn't help us as is, but going forward, its a first step towards padding the vocab dimension to a sane value that actually allows for fast implementations. I haven't...

Backward kernel where threads reuse data in registers to reduce memory transfers. This PR is build on top of my previous PRs, which should be merged first. Once that is...

reuse memory buffers across layers during the backward pass