flopy icon indicating copy to clipboard operation
flopy copied to clipboard

PathlineFile.get_destination_pathline_data() performance

Open wpbonelli opened this issue 3 years ago • 0 comments

The flopy3_modpath7_create_simulation.ipynb notebook is slow, taking 10-15 minutes on CI. The bottleneck seems to be in PathlineFile.get_destination_pathline_data(). Calculating pathline data for the backwards case with river cell locations is slowest. Computing endpoint data is generally fast.

Benchmark

Results from my machine (runtimes in milliseconds):

Name (time in ms) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations
test_get_destination_endpoint_data[well-forward] 1.1320 (1.0) 1.6217 (1.02) 1.1668 (1.0) 0.0368 (1.20) 1.1588 (1.0) 0.0335 (1.38) 92;34 857.0405 (1.0) 790 1
test_get_destination_endpoint_data[river-forward] 1.2307 (1.09) 1.5880 (1.0) 1.2638 (1.08) 0.0308 (1.0) 1.2575 (1.09) 0.0242 (1.0) 82;62 791.2854 (0.92) 739 1
test_get_destination_endpoint_data[well-backward] 2.1071 (1.86) 37.6406 (23.70) 2.2844 (1.96) 1.8545 (60.22) 2.1728 (1.88) 0.0754 (3.11) 1;9 437.7543 (0.51) 366 1
test_get_destination_endpoint_data[river-backward] 2.1563 (1.90) 3.0739 (1.94) 2.2155 (1.90) 0.0620 (2.01) 2.2074 (1.90) 0.0526 (2.17) 33;16 451.3653 (0.53) 433 1
test_get_destination_pathline_data[well-backward] 26,589.8808 (>1000.0) 26,652.4284 (>1000.0) 26,623.4600 (>1000.0) 25.6051 (831.50) 26,616.2695 (>1000.0) 40.1569 (>1000.0) 2;0 0.0376 (0.00) 5 1
test_get_destination_pathline_data[well-forward] 40,028.3296 (>1000.0) 40,195.3408 (>1000.0) 40,104.5977 (>1000.0) 70.1838 (>1000.0) 40,077.0743 (>1000.0) 115.3085 (>1000.0) 2;0 0.0249 (0.00) 5 1
test_get_destination_pathline_data[river-forward] 61,112.0500 (>1000.0) 61,440.9841 (>1000.0) 61,247.5391 (>1000.0) 132.9419 (>1000.0) 61,200.5654 (>1000.0) 201.5238 (>1000.0) 2;0 0.0163 (0.00) 5 1
test_get_destination_pathline_data[river-backward] 568,730.0336 (>1000.0) 575,857.0278 (>1000.0) 571,281.7784 (>1000.0) 2,803.3254 (>1000.0) 571,121.0215 (>1000.0) 3,365.4625 (>1000.0) 1;0 0.0018 (0.00) 5 1

Performance profile

Below is a profile of the backward/river case. The ModpathFile.get_destination_pathline() method spends most of its time sorting numpy arrays:

Screen Shot 2022-08-05 at 5 21 46 AM

Calls ordered by total time:

ncalls tottime percall cumtime percall filename:lineno(function)
2626 457.829 0.174 457.851 0.174 {method 'sort' of 'numpy.ndarray' objects}
2625 2.875 0.001 460.892 0.176 /Users/wes/dev/flopy/flopy/utils/modpathfile.py:129(get_data)
2627 0.158 0.000 0.182 0.000 {built-in method numpy.core._multiarray_umath.implement_array_function}
2 0.025 0.013 0.025 0.013 {built-in method numpy.array}
1 0.021 0.021 0.022 0.022 /Users/wes/dev/flopy/venv/lib/python3.10/site-packages/numpy/lib/arraysetops.py:523(in1d)
1 0.019 0.019 460.996 460.996 /Users/wes/dev/flopy/flopy/utils/modpathfile.py:195(get_destination_data)
1 0.018 0.018 0.018 0.018 {method 'copy' of 'numpy.ndarray' objects}
2625 0.017 0.000 0.022 0.000 /Users/wes/dev/flopy/venv/lib/python3.10/site-packages/numpy/core/_internal.py:395(_newnames)
2625 0.011 0.000 460.902 0.176 /Users/wes/dev/flopy/flopy/utils/modpathfile.py:633(get_data)
2625 0.008 0.000 0.167 0.000 <array_function internals>:177(where)
1 0.006 0.006 460.909 460.909 /Users/wes/dev/flopy/flopy/utils/modpathfile.py:266()
5273 0.003 0.000 0.003 0.000 {built-in method builtins.isinstance}
5250 0.002 0.000 0.002 0.000 {method 'remove' of 'list' objects}
1 0.002 0.002 460.997 460.997 /Users/wes/dev/flopy/flopy/utils/modpathfile.py:704(get_destination_pathline_data)

It looks like the sorting occurs in _ModpathSeries.get_data():

https://github.com/modflowpy/flopy/blob/5d0f5857d5ca5d5a92ad9207e0d8340acc214afa/flopy/utils/modpathfile.py#L150

wpbonelli avatar Aug 03 '22 20:08 wpbonelli