DynamicExpressions.jl
DynamicExpressions.jl copied to clipboard
Native GPU support
This PR adds native GPU support. This is a single CUDA kernel which evaluates an expression directly on the GPU!
This also allows one to evaluate multiple trees at once (which helps can save time in the CUDA kernel).
TODO:
- [x] See whether
CUDA.@capturedhelps at al- Nope...
- [ ] Explore whether manually manipulating CUDA streams will help at all
- [ ] See whether I need to use
@syncanywhere - [ ] Consider adding Optim support now or later
Benchmark Results
| master | 3da1d38b1da79b... | master/3da1d38b1da79b... | |
|---|---|---|---|
| eval/ComplexF32/evaluation | 7.25 ± 0.53 ms | 7.22 ± 0.54 ms | 1 |
| eval/ComplexF64/evaluation | 10.7 ± 0.72 ms | 10.5 ± 0.75 ms | 1.01 |
| eval/Float32/derivative | 11.6 ± 0.84 ms | 11.2 ± 0.75 ms | 1.03 |
| eval/Float32/derivative_turbo | 11.5 ± 0.85 ms | 11.2 ± 0.84 ms | 1.03 |
| eval/Float32/evaluation | 2.7 ± 0.26 ms | 2.7 ± 0.24 ms | 1 |
| eval/Float32/evaluation_bumper | 0.573 ± 0.016 ms | 0.567 ± 0.015 ms | 1.01 |
| eval/Float32/evaluation_turbo | 0.531 ± 0.03 ms | 0.522 ± 0.029 ms | 1.02 |
| eval/Float32/evaluation_turbo_bumper | 0.569 ± 0.016 ms | 0.567 ± 0.015 ms | 1 |
| eval/Float64/derivative | 15.9 ± 0.63 ms | 14.5 ± 0.57 ms | 1.1 |
| eval/Float64/derivative_turbo | 15.8 ± 0.71 ms | 14.2 ± 0.63 ms | 1.11 |
| eval/Float64/evaluation | 3.13 ± 0.32 ms | 3.11 ± 0.28 ms | 1.01 |
| eval/Float64/evaluation_bumper | 1.18 ± 0.044 ms | 1.17 ± 0.043 ms | 1.01 |
| eval/Float64/evaluation_turbo | 1.02 ± 0.069 ms | 0.994 ± 0.067 ms | 1.02 |
| eval/Float64/evaluation_turbo_bumper | 1.18 ± 0.044 ms | 1.18 ± 0.045 ms | 1.01 |
| utils/combine_operators/break_sharing | 0.0389 ± 0.001 ms | 0.0391 ± 0.00045 ms | 0.995 |
| utils/convert/break_sharing | 27.9 ± 2.8 μs | 26.5 ± 2.2 μs | 1.05 |
| utils/convert/preserve_sharing | 0.0987 ± 0.0037 ms | 0.0971 ± 0.0035 ms | 1.02 |
| utils/copy/break_sharing | 28.3 ± 2.2 μs | 27.2 ± 2.1 μs | 1.04 |
| utils/copy/preserve_sharing | 0.0988 ± 0.0034 ms | 0.0967 ± 0.0036 ms | 1.02 |
| utils/count_constant_nodes/break_sharing | 8.64 ± 0.18 μs | 9.07 ± 0.16 μs | 0.952 |
| utils/count_constant_nodes/preserve_sharing | 0.0893 ± 0.0031 ms | 0.0853 ± 0.0035 ms | 1.05 |
| utils/count_depth/break_sharing | 14.3 ± 0.37 μs | 9.52 ± 0.2 μs | 1.5 |
| utils/count_nodes/break_sharing | 9.17 ± 0.42 μs | 8.22 ± 0.21 μs | 1.12 |
| utils/count_nodes/preserve_sharing | 0.0856 ± 0.0029 ms | 0.0849 ± 0.0032 ms | 1.01 |
| utils/get_set_constants!/break_sharing | 0.0343 ± 0.0021 ms | 0.0332 ± 0.0021 ms | 1.04 |
| utils/get_set_constants!/preserve_sharing | 0.175 ± 0.005 ms | 0.175 ± 0.0053 ms | 1 |
| utils/get_set_constants_parametric | 0.0433 ± 0.002 ms | 0.0439 ± 0.0018 ms | 0.988 |
| utils/has_constants/break_sharing | 4.28 ± 0.12 μs | 4.1 ± 0.13 μs | 1.04 |
| utils/has_operators/break_sharing | 2.25 ± 0.052 μs | 2.03 ± 0.044 μs | 1.11 |
| utils/hash/break_sharing | 24 ± 0.75 μs | 22.8 ± 0.6 μs | 1.05 |
| utils/hash/preserve_sharing | 0.0981 ± 0.0031 ms | 0.0964 ± 0.0032 ms | 1.02 |
| utils/index_constant_nodes/break_sharing | 25.7 ± 1.1 μs | 25 ± 0.87 μs | 1.03 |
| utils/index_constant_nodes/preserve_sharing | 0.0992 ± 0.0032 ms | 0.0978 ± 0.0036 ms | 1.01 |
| utils/is_constant/break_sharing | 3.89 ± 0.13 μs | 4.42 ± 0.11 μs | 0.881 |
| utils/simplify_tree/break_sharing | 0.169 ± 0.003 ms | 0.166 ± 0.003 ms | 1.02 |
| utils/simplify_tree/preserve_sharing | 0.225 ± 0.0043 ms | 0.215 ± 0.0046 ms | 1.04 |
| utils/string_tree/break_sharing | 0.451 ± 0.013 ms | 0.451 ± 0.014 ms | 1 |
| utils/string_tree/preserve_sharing | 0.547 ± 0.016 ms | 0.559 ± 0.018 ms | 0.979 |
| time_to_load | 0.229 ± 0.0053 s | 0.226 ± 0.004 s | 1.01 |
Pull Request Test Coverage Report for Build 8042273246
Details
- -2 of 137 (98.54%) changed or added relevant lines in 3 files are covered.
- No unchanged relevant lines lost coverage.
- Overall coverage increased (+0.3%) to 94.965%
| Changes Missing Coverage | Covered Lines | Changed/Added Lines | % |
|---|---|---|---|
| ext/DynamicExpressionsCUDAExt.jl | 78 | 80 | 97.5% |
| <!-- | Total: | 135 | 137 |
| Totals | |
|---|---|
| Change from base Build 7996123220: | 0.3% |
| Covered Lines: | 1754 |
| Relevant Lines: | 1847 |
💛 - Coveralls
Pull Request Test Coverage Report for Build 12348903122
Details
- 129 of 132 (97.73%) changed or added relevant lines in 3 files are covered.
- No unchanged relevant lines lost coverage.
- Overall coverage increased (+0.1%) to 95.637%
| Changes Missing Coverage | Covered Lines | Changed/Added Lines | % |
|---|---|---|---|
| ext/DynamicExpressionsCUDAExt.jl | 72 | 75 | 96.0% |
| <!-- | Total: | 129 | 132 |
| Totals | |
|---|---|
| Change from base Build 12322890369: | 0.1% |
| Covered Lines: | 2674 |
| Relevant Lines: | 2796 |