4. Hyper-parameter tuning (extension with XGBoost)
an extension to the ML lessons, picking up where k-fold left off currently runs into problems with the CI timing out, presumably due to long running time?
https://colab.research.google.com/
Notebook real time --- 1051.062778711319 seconds ---
Notebook CPU time --- 728.3288249999999 seconds ---
The successfully executed notebook with plots generated using full stats may be found here: https://colab.research.google.com/drive/1I34R8eCck3wo1YX54WllUKJw9KdoID8Y
It looks like there are some more imports to fix
@chrisburr down to <20mins on colab, is there any leeway we can get on the CI?
@chrisburr unsure of what the issue is, and how to proceed with the last step:
test $TRAVIS_PULL_REQUEST == "false" && test $TRAVIS_BRANCH == "master" && starterkit_ci deploy
You can ignore that, it’s the bit that uploads it to the website but we can fix that later
@chrisburr I had a look in the starterkit-ci but couldn't see where this problem arises
You need to add it to one of the tables of contents. The error is complaining that the page is unreachable from any of the menus.
nice, thanks
created an issue: https://github.com/hsf-training/analysis-essentials/issues/25 to view the lesson being added: https://colab.research.google.com/drive/13dwXKKHxqQfk8Zo2Gzh46_pCc9xBgi1H?usp=sharing
Hi, what's the status of this? The build seems to fail
As far as I remember, the issue was that the CI would time out I think this was an issue with the notebook taking 20-30mins to run
Chris also suspected issues with its inclusion in the table of contents tree I'm unsure of how to proceed in either case
The notebooks may also now be out of sync with the current SK build I've not kept up with any changes over the last few years
Okay, so since the build time takes up already quite a lot, it would be great oc to move this build time down somehow. What about using a smaller sample? 10%, 5% of the sample? (and note that this is done for demonstration purposes?
I would just suggest to either give this a little push now or close as nomerge (the longer it's here, the lower the odds that it gets merged at some point). What do you think?
I've got entrystop set to 1000 events and am just waiting for the dependencies to finish installing (currently at ~1h30)
Install dependancies:
6h 0m 8s
Run source ${CONDA}/etc/profile.d/conda.sh
Collecting package metadata (repodata.json): ...working... done
Error: The operation was canceled.
https://github.com/jvmead/analysis-essentials/runs/4422343032?check_suite_focus=true
The branch uses (an old?) a version of the github workflow which is different from the master (and uses stuff such as the source conda)
The file seems to have gotten renamed to build instead of CI. Can you maybe rebase or merge master into your branch and push again?
@jonas-eschle I'm a little lost at this point https://github.com/jvmead/analysis-essentials/runs/5050083345?check_suite_focus=true#step:4:116 I'm not sure if this is a change in uproot or the dataset
This is indeed a change in uproot (in 4), an example of how to get the same now is using can be found here: https://github.com/hsf-training/analysis-essentials/blob/master/advanced-python/20DataAndPlotting.ipynb (with the .array you can convert it to np or pd for pandas)
Fantastic, thanks very much!
@jonas-eschle now I'm seeing problems with advanced-python/20DataAndPlotting.ipynb https://github.com/jvmead/analysis-essentials/runs/5052417479?check_suite_focus=true#step:4:57 I've not made any changes to that file since pulling from master
Hm, that should be only temporary, a problem with retrieving the files over the network
Can you maybe restart the CI? I think this was a temporary problem that was solved now
I suspect that it's not possible to rerun the workflows (seems like that's only possible for 1 month?). I don't find the usual rerun workflow button there and this also seems the case in other repos with old runs.
So maybe just push something trivial?
I've pushed a trivial change, just updating the link to the version on google collab which contains plots for comparison. The version on collab also has a couple of cells added at the end which I think might be worth making note of in a lesson which might use this example but have not been pushed to here: https://colab.research.google.com/drive/1I34R8eCck3wo1YX54WllUKJw9KdoID8Y#scrollTo=8Xlc0Wik0xfY https://colab.research.google.com/drive/1I34R8eCck3wo1YX54WllUKJw9KdoID8Y#scrollTo=q6hRFVQHTXQ-&line=2&uniqifier=1
now it looks fine! However, the document is not yet included anywhere in the toctree. Maybe it could be as an additional tutorial, like an advanced? Or with the normal ones? What do you suggest?
When I demonstrated in the advanced python session, the notebook for lesson 4 (now labelled 3.1) had a title for k-folding but no content beyond that though I see now there is a k-folding explanation added. This 4b notebook was meant to pick up where that one left off perhaps it should renamed 3.1.5 or maybe 3.2 and UBoost can be 3.3? I imagine there probably isn't time to allow students to run everything in the class but it might be worth having the notebook available on the site while the lecturer can scroll through the pre-run more complete version (https://bit.ly/LHCb_XGB_Tuning) to show the plots and discuss the process and allow people to un-comment the extra dimensions of the hyperparameter scan and test it in their own time.
This issue or pull request has been automatically marked as stale because it has not had recent activity. Please manually close it, if it is no longer relevant, or ask for help or support to help getting it unstuck. Let me bring this to the attention of @klieret @wdconinc @michmx for now.
Hi @jonas-eschle @jvmead It seemed to me that you were almost ready to merge this, right? What's still blocking this?
I ran into this issue and have been away from my desk for the better part of a month. It seems the already generated image files are included in the notebook file (at least for basics01) for some reason and are stopping the SK CI. If I remember correctly, the notebooks have to be uploaded without containing any of the metadata from running such as plots.
Yes, they should be uploaded without an image, but executing them should produce the image. So I am not sure I understand the problem. Currently not really available, but otherwise ping me again later to look into this if it still persists
I haven't touched any notebook but my own so I'm not sure where this issue is arising, thanks I'd appreciate if you could take a look upon your return
This issue or pull request has been automatically marked as stale because it has not had recent activity. Please manually close it, if it is no longer relevant, or ask for help or support to help getting it unstuck. Let me bring this to the attention of @klieret @wdconinc @michmx for now.