Random forest parallelization
@daraghk , thank you very much for your contribution to Smartcore! The code looks good, the only thing that is left are clippy warnings and better test coverage. Once you have closed all pending checks we can merge this PR to development
@VolodymyrOrlov happy to contribute! I've fixed the code coverage issue but the linting errors are unrelated to any code I touched - you can see the same issues in the other random forest pr which is open at the moment.
Codecov Report
Merging #137 (e9da303) into development (b4a807e) will increase coverage by
0.00%. The diff coverage is85.96%.
@@ Coverage Diff @@
## development #137 +/- ##
============================================
Coverage 83.40% 83.41%
============================================
Files 78 79 +1
Lines 8377 8453 +76
============================================
+ Hits 6987 7051 +64
- Misses 1390 1402 +12
| Impacted Files | Coverage Δ | |
|---|---|---|
| src/linalg/ndarray_bindings.rs | 83.18% <ø> (ø) |
|
| src/math/num.rs | 84.09% <ø> (ø) |
|
| src/ensemble/random_forest_regressor.rs | 73.10% <84.09%> (+2.51%) |
:arrow_up: |
| src/ensemble/random_forest_classifier.rs | 75.12% <86.56%> (+4.09%) |
:arrow_up: |
| src/dataset/digits.rs | 92.85% <100.00%> (+0.54%) |
:arrow_up: |
| src/svm/svc.rs | 89.45% <0.00%> (-0.31%) |
:arrow_down: |
| src/linear/lasso_optimizer.rs | 94.11% <0.00%> (ø) |
|
| src/lib.rs | 0.00% <0.00%> (ø) |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact),ø = not affected,? = missing dataPowered by Codecov. Last update b4a807e...e9da303. Read the comment docs.
Sorry to say this, parallelization is not a characteristic we want to push at the moment as Wasm support is prioritized. We are indeed looking for partial (kernel-, column-, row-dependant) computation but at the matrix operations level at the beginning. Rayon is a great library but at the moment we cannot guarantee wider adoption at codebase level, if there are news on making rayon works with all the targets we want to support I would be glad to reopen this.
@daraghk please take a look to the major changes currently in development (heading to release 0.4), it would be great to have more contributions from you.
I think that we could do parallelization with a feature flag that is disabled for WASM targets
I know but we have no bandwidth to follow up on all these features, creating a feature for one case can be only an increase in complexity with very few benefits. We need to prioritise or allocate more time or have somebody that carries on the task of extending parallelization to more algorithms.