analysis-essentials icon indicating copy to clipboard operation
analysis-essentials copied to clipboard

4. Hyper-parameter tuning (extension with XGBoost)

Open jvmead opened this issue 5 years ago • 33 comments

an extension to the ML lessons, picking up where k-fold left off currently runs into problems with the CI timing out, presumably due to long running time?

https://colab.research.google.com/
Notebook real time --- 1051.062778711319 seconds ---
Notebook CPU time --- 728.3288249999999 seconds ---

The successfully executed notebook with plots generated using full stats may be found here: https://colab.research.google.com/drive/1I34R8eCck3wo1YX54WllUKJw9KdoID8Y

jvmead avatar Feb 26 '20 11:02 jvmead

It looks like there are some more imports to fix

chrisburr avatar Feb 26 '20 13:02 chrisburr

@chrisburr down to <20mins on colab, is there any leeway we can get on the CI?

jvmead avatar Feb 28 '20 21:02 jvmead

@chrisburr unsure of what the issue is, and how to proceed with the last step: test $TRAVIS_PULL_REQUEST == "false" && test $TRAVIS_BRANCH == "master" && starterkit_ci deploy

jvmead avatar Feb 29 '20 11:02 jvmead

You can ignore that, it’s the bit that uploads it to the website but we can fix that later

chrisburr avatar Feb 29 '20 11:02 chrisburr

@chrisburr I had a look in the starterkit-ci but couldn't see where this problem arises

jvmead avatar Mar 02 '20 17:03 jvmead

You need to add it to one of the tables of contents. The error is complaining that the page is unreachable from any of the menus.

chrisburr avatar Mar 02 '20 17:03 chrisburr

nice, thanks

jvmead avatar Mar 02 '20 17:03 jvmead

created an issue: https://github.com/hsf-training/analysis-essentials/issues/25 to view the lesson being added: https://colab.research.google.com/drive/13dwXKKHxqQfk8Zo2Gzh46_pCc9xBgi1H?usp=sharing

jvmead avatar Dec 09 '20 09:12 jvmead

Hi, what's the status of this? The build seems to fail

jonas-eschle avatar Nov 19 '21 20:11 jonas-eschle

As far as I remember, the issue was that the CI would time out I think this was an issue with the notebook taking 20-30mins to run

Chris also suspected issues with its inclusion in the table of contents tree I'm unsure of how to proceed in either case

The notebooks may also now be out of sync with the current SK build I've not kept up with any changes over the last few years

jvmead avatar Nov 20 '21 11:11 jvmead

Okay, so since the build time takes up already quite a lot, it would be great oc to move this build time down somehow. What about using a smaller sample? 10%, 5% of the sample? (and note that this is done for demonstration purposes?

I would just suggest to either give this a little push now or close as nomerge (the longer it's here, the lower the odds that it gets merged at some point). What do you think?

jonas-eschle avatar Nov 20 '21 12:11 jonas-eschle

I've got entrystop set to 1000 events and am just waiting for the dependencies to finish installing (currently at ~1h30)

jvmead avatar Dec 05 '21 14:12 jvmead

Install dependancies:

6h 0m 8s
Run source ${CONDA}/etc/profile.d/conda.sh
Collecting package metadata (repodata.json): ...working... done
Error: The operation was canceled.

https://github.com/jvmead/analysis-essentials/runs/4422343032?check_suite_focus=true

jvmead avatar Dec 06 '21 09:12 jvmead

The branch uses (an old?) a version of the github workflow which is different from the master (and uses stuff such as the source conda) The file seems to have gotten renamed to build instead of CI. Can you maybe rebase or merge master into your branch and push again?

jonas-eschle avatar Dec 07 '21 22:12 jonas-eschle

@jonas-eschle I'm a little lost at this point https://github.com/jvmead/analysis-essentials/runs/5050083345?check_suite_focus=true#step:4:116 I'm not sure if this is a change in uproot or the dataset

jvmead avatar Feb 03 '22 11:02 jvmead

This is indeed a change in uproot (in 4), an example of how to get the same now is using can be found here: https://github.com/hsf-training/analysis-essentials/blob/master/advanced-python/20DataAndPlotting.ipynb (with the .array you can convert it to np or pd for pandas)

jonas-eschle avatar Feb 03 '22 11:02 jonas-eschle

Fantastic, thanks very much!

jvmead avatar Feb 03 '22 11:02 jvmead

@jonas-eschle now I'm seeing problems with advanced-python/20DataAndPlotting.ipynb https://github.com/jvmead/analysis-essentials/runs/5052417479?check_suite_focus=true#step:4:57 I've not made any changes to that file since pulling from master

jvmead avatar Feb 03 '22 14:02 jvmead

Hm, that should be only temporary, a problem with retrieving the files over the network

jonas-eschle avatar Feb 08 '22 19:02 jonas-eschle

Can you maybe restart the CI? I think this was a temporary problem that was solved now

jonas-eschle avatar May 02 '22 14:05 jonas-eschle

I suspect that it's not possible to rerun the workflows (seems like that's only possible for 1 month?). I don't find the usual rerun workflow button there and this also seems the case in other repos with old runs.

So maybe just push something trivial?

klieret avatar May 06 '22 13:05 klieret

I've pushed a trivial change, just updating the link to the version on google collab which contains plots for comparison. The version on collab also has a couple of cells added at the end which I think might be worth making note of in a lesson which might use this example but have not been pushed to here: https://colab.research.google.com/drive/1I34R8eCck3wo1YX54WllUKJw9KdoID8Y#scrollTo=8Xlc0Wik0xfY https://colab.research.google.com/drive/1I34R8eCck3wo1YX54WllUKJw9KdoID8Y#scrollTo=q6hRFVQHTXQ-&line=2&uniqifier=1

jvmead avatar May 07 '22 12:05 jvmead

now it looks fine! However, the document is not yet included anywhere in the toctree. Maybe it could be as an additional tutorial, like an advanced? Or with the normal ones? What do you suggest?

jonas-eschle avatar May 09 '22 12:05 jonas-eschle

When I demonstrated in the advanced python session, the notebook for lesson 4 (now labelled 3.1) had a title for k-folding but no content beyond that though I see now there is a k-folding explanation added. This 4b notebook was meant to pick up where that one left off perhaps it should renamed 3.1.5 or maybe 3.2 and UBoost can be 3.3? I imagine there probably isn't time to allow students to run everything in the class but it might be worth having the notebook available on the site while the lecturer can scroll through the pre-run more complete version (https://bit.ly/LHCb_XGB_Tuning) to show the plots and discuss the process and allow people to un-comment the extra dimensions of the hyperparameter scan and test it in their own time.

jvmead avatar May 09 '22 13:05 jvmead

This issue or pull request has been automatically marked as stale because it has not had recent activity. Please manually close it, if it is no longer relevant, or ask for help or support to help getting it unstuck. Let me bring this to the attention of @klieret @wdconinc @michmx for now.

stale[bot] avatar Jul 19 '22 17:07 stale[bot]

Hi @jonas-eschle @jvmead It seemed to me that you were almost ready to merge this, right? What's still blocking this?

klieret avatar Jul 19 '22 18:07 klieret

I ran into this issue and have been away from my desk for the better part of a month. It seems the already generated image files are included in the notebook file (at least for basics01) for some reason and are stopping the SK CI. If I remember correctly, the notebooks have to be uploaded without containing any of the metadata from running such as plots.

jvmead avatar Jul 25 '22 09:07 jvmead

Yes, they should be uploaded without an image, but executing them should produce the image. So I am not sure I understand the problem. Currently not really available, but otherwise ping me again later to look into this if it still persists

jonas-eschle avatar Jul 25 '22 14:07 jonas-eschle

I haven't touched any notebook but my own so I'm not sure where this issue is arising, thanks I'd appreciate if you could take a look upon your return

jvmead avatar Jul 26 '22 12:07 jvmead

This issue or pull request has been automatically marked as stale because it has not had recent activity. Please manually close it, if it is no longer relevant, or ask for help or support to help getting it unstuck. Let me bring this to the attention of @klieret @wdconinc @michmx for now.

stale[bot] avatar Sep 24 '22 14:09 stale[bot]