[ADD] fit ensemble

Open ravinkohli opened this issue 4 years ago • 1 comments

This PR enables fitting an ensemble after search has finished. It also fixes issue #299.

Types of changes

[x] New feature (non-breaking change which adds functionality)

Note that a Pull Request should only contain one of refactoring, new features or documentation changes. Please separate these changes and send us individual PRs for each. For more information on how to create a good pull request, please refer to The anatomy of a perfect pull request.

Checklist:

[x] My code follows the code style of this project.
[x] My change requires a change to the documentation.
[x] I have updated the documentation accordingly.

[x] Have you checked to ensure there aren't other open Pull Requests for the same update/change?
[x] Have you added an explanation of what your changes do and why you'd like us to include them?
[x] Have you written new tests for your core changes, as applicable?
[x] Have you successfully ran tests with your changes locally?

Description

This PR adds a function called fit_ensemble which can create an ensemble after search has finished. It uses the same backend working directory as the search and builds an ensemble based on the predictions of models saved during search. Also, as ensemble creation is now a separate process, it does not make sense to instantiate a task tied with an ensemble_size and ensemble_nbest. Therefore, these parameters are no longer class parameters, instead, they are passed to the search function or fit_ensemble function. Additionally, as raised in #299 we raise a warning that ensemble could not be built regardless if the user did not want it in the first place.

Motivation and Context

This PR enables the ability to fit an ensemble post hoc. This enables the ability to create multiple ensembles with the same algorithms found in search, it can also save time in the search by removing the overhead with fitting the ensemble as it is passed as a callback to smac. Moreover, in the future, we can also enable creating an ensembles stored in disk with a new task object.

How has this been tested?

As ensemble fitting was already being tested in the test_api, I have added tests to a new function _init_ensemble_builder. Moreover, with the posthot_ensemble_fit example, it ensures a smooth function of search with ensemble_size=0.

Dec 25 '21 17:12 ravinkohli

Codecov Report

Merging #366 (200aa7f) into development (0e574af) will decrease coverage by 57.65%. The diff coverage is 4.95%.

@@               Coverage Diff                @@
##           development     #366       +/-   ##
================================================
- Coverage        85.50%   27.84%   -57.66%     
================================================
  Files              231      230        -1     
  Lines            16303    16331       +28     
  Branches          3009     3022       +13     
================================================
- Hits             13940     4548     -9392     
- Misses            1524    11781    +10257     
+ Partials           839        2      -837

Impacted Files	Coverage Δ
autoPyTorch/api/tabular_classification.py	`46.66% <ø> (-44.45%)`	:arrow_down:
autoPyTorch/api/base_task.py	`15.61% <4.95%> (-68.20%)`	:arrow_down:
...cessing/time_series_preprocessing/scaling/utils.py	`8.00% <0.00%> (-84.00%)`	:arrow_down:
...tup/network_backbone/forecasting_backbone/cells.py	`8.56% <0.00%> (-83.80%)`	:arrow_down:
...mponents/setup/forecasting_target_scaling/utils.py	`7.44% <0.00%> (-82.98%)`	:arrow_down:
...mponents/setup/network/forecasting_architecture.py	`10.10% <0.00%> (-80.50%)`	:arrow_down:
...omponents/setup/network_backbone/ResNetBackbone.py	`19.62% <0.00%> (-80.38%)`	:arrow_down:
...twork_head/forecasting_network_head/NBEATS_head.py	`18.84% <0.00%> (-78.27%)`	:arrow_down:
autoPyTorch/ensemble/ensemble_selection.py	`18.75% <0.00%> (-78.13%)`	:arrow_down:
...oPyTorch/data/time_series_forecasting_validator.py	`9.52% <0.00%> (-76.79%)`	:arrow_down:
... and 206 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 0e574af...200aa7f. Read the comment docs.

Dec 25 '21 18:12 codecov[bot]