DynamicExpressions.jl icon indicating copy to clipboard operation
DynamicExpressions.jl copied to clipboard

Native GPU support

Open MilesCranmer opened this issue 2 years ago • 3 comments

This PR adds native GPU support. This is a single CUDA kernel which evaluates an expression directly on the GPU!

This also allows one to evaluate multiple trees at once (which helps can save time in the CUDA kernel).

graphviz

TODO:

  • [x] See whether CUDA.@captured helps at al
    • Nope...
  • [ ] Explore whether manually manipulating CUDA streams will help at all
  • [ ] See whether I need to use @sync anywhere
  • [ ] Consider adding Optim support now or later

MilesCranmer avatar Feb 03 '24 23:02 MilesCranmer

Benchmark Results

master 3da1d38b1da79b... master/3da1d38b1da79b...
eval/ComplexF32/evaluation 7.25 ± 0.53 ms 7.22 ± 0.54 ms 1
eval/ComplexF64/evaluation 10.7 ± 0.72 ms 10.5 ± 0.75 ms 1.01
eval/Float32/derivative 11.6 ± 0.84 ms 11.2 ± 0.75 ms 1.03
eval/Float32/derivative_turbo 11.5 ± 0.85 ms 11.2 ± 0.84 ms 1.03
eval/Float32/evaluation 2.7 ± 0.26 ms 2.7 ± 0.24 ms 1
eval/Float32/evaluation_bumper 0.573 ± 0.016 ms 0.567 ± 0.015 ms 1.01
eval/Float32/evaluation_turbo 0.531 ± 0.03 ms 0.522 ± 0.029 ms 1.02
eval/Float32/evaluation_turbo_bumper 0.569 ± 0.016 ms 0.567 ± 0.015 ms 1
eval/Float64/derivative 15.9 ± 0.63 ms 14.5 ± 0.57 ms 1.1
eval/Float64/derivative_turbo 15.8 ± 0.71 ms 14.2 ± 0.63 ms 1.11
eval/Float64/evaluation 3.13 ± 0.32 ms 3.11 ± 0.28 ms 1.01
eval/Float64/evaluation_bumper 1.18 ± 0.044 ms 1.17 ± 0.043 ms 1.01
eval/Float64/evaluation_turbo 1.02 ± 0.069 ms 0.994 ± 0.067 ms 1.02
eval/Float64/evaluation_turbo_bumper 1.18 ± 0.044 ms 1.18 ± 0.045 ms 1.01
utils/combine_operators/break_sharing 0.0389 ± 0.001 ms 0.0391 ± 0.00045 ms 0.995
utils/convert/break_sharing 27.9 ± 2.8 μs 26.5 ± 2.2 μs 1.05
utils/convert/preserve_sharing 0.0987 ± 0.0037 ms 0.0971 ± 0.0035 ms 1.02
utils/copy/break_sharing 28.3 ± 2.2 μs 27.2 ± 2.1 μs 1.04
utils/copy/preserve_sharing 0.0988 ± 0.0034 ms 0.0967 ± 0.0036 ms 1.02
utils/count_constant_nodes/break_sharing 8.64 ± 0.18 μs 9.07 ± 0.16 μs 0.952
utils/count_constant_nodes/preserve_sharing 0.0893 ± 0.0031 ms 0.0853 ± 0.0035 ms 1.05
utils/count_depth/break_sharing 14.3 ± 0.37 μs 9.52 ± 0.2 μs 1.5
utils/count_nodes/break_sharing 9.17 ± 0.42 μs 8.22 ± 0.21 μs 1.12
utils/count_nodes/preserve_sharing 0.0856 ± 0.0029 ms 0.0849 ± 0.0032 ms 1.01
utils/get_set_constants!/break_sharing 0.0343 ± 0.0021 ms 0.0332 ± 0.0021 ms 1.04
utils/get_set_constants!/preserve_sharing 0.175 ± 0.005 ms 0.175 ± 0.0053 ms 1
utils/get_set_constants_parametric 0.0433 ± 0.002 ms 0.0439 ± 0.0018 ms 0.988
utils/has_constants/break_sharing 4.28 ± 0.12 μs 4.1 ± 0.13 μs 1.04
utils/has_operators/break_sharing 2.25 ± 0.052 μs 2.03 ± 0.044 μs 1.11
utils/hash/break_sharing 24 ± 0.75 μs 22.8 ± 0.6 μs 1.05
utils/hash/preserve_sharing 0.0981 ± 0.0031 ms 0.0964 ± 0.0032 ms 1.02
utils/index_constant_nodes/break_sharing 25.7 ± 1.1 μs 25 ± 0.87 μs 1.03
utils/index_constant_nodes/preserve_sharing 0.0992 ± 0.0032 ms 0.0978 ± 0.0036 ms 1.01
utils/is_constant/break_sharing 3.89 ± 0.13 μs 4.42 ± 0.11 μs 0.881
utils/simplify_tree/break_sharing 0.169 ± 0.003 ms 0.166 ± 0.003 ms 1.02
utils/simplify_tree/preserve_sharing 0.225 ± 0.0043 ms 0.215 ± 0.0046 ms 1.04
utils/string_tree/break_sharing 0.451 ± 0.013 ms 0.451 ± 0.014 ms 1
utils/string_tree/preserve_sharing 0.547 ± 0.016 ms 0.559 ± 0.018 ms 0.979
time_to_load 0.229 ± 0.0053 s 0.226 ± 0.004 s 1.01

github-actions[bot] avatar Feb 03 '24 23:02 github-actions[bot]

Pull Request Test Coverage Report for Build 8042273246

Details

  • -2 of 137 (98.54%) changed or added relevant lines in 3 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.3%) to 94.965%

Changes Missing Coverage Covered Lines Changed/Added Lines %
ext/DynamicExpressionsCUDAExt.jl 78 80 97.5%
<!-- Total: 135 137
Totals Coverage Status
Change from base Build 7996123220: 0.3%
Covered Lines: 1754
Relevant Lines: 1847

💛 - Coveralls

coveralls avatar Feb 25 '24 21:02 coveralls

Pull Request Test Coverage Report for Build 12348903122

Details

  • 129 of 132 (97.73%) changed or added relevant lines in 3 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.1%) to 95.637%

Changes Missing Coverage Covered Lines Changed/Added Lines %
ext/DynamicExpressionsCUDAExt.jl 72 75 96.0%
<!-- Total: 129 132
Totals Coverage Status
Change from base Build 12322890369: 0.1%
Covered Lines: 2674
Relevant Lines: 2796

💛 - Coveralls

coveralls avatar Dec 16 '24 02:12 coveralls