NNlib.jl icon indicating copy to clipboard operation
NNlib.jl copied to clipboard

Neural Network primitives with multiple backends

Results 141 NNlib.jl issues
Sort by recently updated
recently updated
newest added

This currently does not add any new doctests or improve the existing docs, as that would make this PR massive. Here are the docstrings that are missing from the manual...

`NNlib.jl` does not have independent documentation, rather, it has a [dedicated page](https://fluxml.ai/Flux.jl/stable/models/nnlib/) in `Flux`'s documentation. Due to the lack of stand-alone documentation, it does not have doctests, and the documentation...

https://github.com/FluxML/NNlib.jl/blob/023cd3da63892b754b4197fe7a848093128f2bf9/src/scatter.jl#L206-L212 https://github.com/FluxML/NNlib.jl/blob/023cd3da63892b754b4197fe7a848093128f2bf9/src/gather.jl#L82-L87

help wanted

the PR in comments is merged https://github.com/FluxML/NNlib.jl/blob/023cd3da63892b754b4197fe7a848093128f2bf9/test/test_utils.jl#L13-L19

Functions like gather/scatter give scalar indexing errors if used on CuArrays without remembering to load NNlibCUDA. Since there is now a very lightweight GPUArraysCore, I think NNlib should depend on...

help wanted

This aims to add gradient definitions for the existing `conv_bias_act`. That is, however, very much WIP, and I don't recommend anyone try to read it just yet. It also adds...

performance

Initial steps to fix #352 TODO - [ ] give more thought to the interface (which outputs do we expect?) - [ ] add rrule or write it in an...

enhancement

in conjunction with https://github.com/FluxML/NNlibCUDA.jl/pull/32, add support for half-precision `gemm`, for which a special kernel is provided by Nvidia. see https://github.com/JuliaGPU/CUDA.jl/pull/1080

enhancement

Initial sparsemax implementation #354 TODO - [x] add support for vectors (similar to softmax) - [ ] add comprehensive testing - [x] compare gradients with PyTorch implementations

enhancement

```julia using CUDA, NNlib using Flux:gpu #this imports NNlibCUDA using BenchmarkTools CUDA.allowscalar(false) a = CUDA.rand(200,3000,64) idx = rand(1:64,500) idx_gpu = idx |> gpu @benchmark CUDA.@sync NNlib.gather(a, idx) @benchmark CUDA.@sync NNlib.gather(a,...