Jacob Hinkle
Jacob Hinkle
Benchmarks are neutral. Before: ``` --------------------------------------------------------------------------------------------------------------------- benchmark: 24 tests ---------------------------------------------------------------------------------------------------------------------- Name (time in us) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- test_nanogpt_sdpa_fwd[thunder] 85.9729 (1.0) 123.5039 (1.0)...
**Is your feature request related to a problem? Please describe.** This is not really a problem but would make for a nice demo. **Describe the solution you'd like** It would...
**Is your feature request related to a problem? Please describe.** We currently have https://lagomorph.readthedocs.io/en/latest/ set up but there is no documentation yet. **Describe the solution you'd like** Add simple sphinx...
**Describe the bug** There are many times in our python code where we force tensors to be contiguous. This is because before pytorch introduced `packed_accessor` it was pretty annoying to...
**Is your feature request related to a problem? Please describe.** Trying to use mpirun less than version 3 with for example `lagomorph lddmm atlas` results in an error currently since...
**Is your feature request related to a problem? Please describe.** Basically, some of the main work in lagomorph was already implemented by the pytorch team. I was unaware of some...
We need a simple way to run benchmarks for our low level functions, in addition to the tests we have which just ensure correctness. What I have in mind is...
**Is your feature request related to a problem? Please describe.** Currently we can compose affine transforms using regular pytorch stuff, and we can compose displacement fields using `lm.interp`. It would...
Currently all our methods are written in cuda. I have anticipated a need for a ROCm extension for example, but it also would of course be nice to have a...
cf. https://devblogs.nvidia.com/faster-parallel-reductions-kepler/ Shuffle intrinsics on nvidia gpus can dramatically speed up custom reductions. Currently the method i use has lots of thread synchronization so there is a lot of room...