twde-datalab
twde-datalab copied to clipboard
Onboarding to data science by ThoughtWorks
Bumps [bleach](https://github.com/mozilla/bleach) from 2.1.4 to 3.1.1. Changelog *Sourced from [bleach's changelog](https://github.com/mozilla/bleach/blob/master/CHANGES).* > Version 3.1.1 (February 13th, 2020) > ----------------------------------- > > **Security fixes** > > * ``bleach.clean`` behavior parsing ``noscript``...
Bumps [urllib3](https://github.com/urllib3/urllib3) from 1.23 to 1.24.2. Changelog *Sourced from [urllib3's changelog](https://github.com/urllib3/urllib3/blob/master/CHANGES.rst).* > 1.24.2 (2019-04-17) > ------------------- > > * Don't load system certificates by default when any other ``ca_certs``, ``ca_certs_dir``...
I suspect using a [random forest](http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html) would be more computationally expensive but improve our predictive power. As I understand it, computational complexity would be the only major cost to swapping...
Maybe it'd be useful to keep track of blog posts and websites that we find useful. I'm making it an issue now, in case this idea flops, but we could...
When handling files of a size larger than a 5GB a couple of issues appear: Creating such a file in the local file system (e.g. when using `boto.download_file()` / or...
Good candidates are https://github.com/ThoughtWorksInc/twde-datalab/blob/eed2740a1f0753f2066e0fead419fc4165d7b2e7/src/decision_tree.py#L30 or https://github.com/ThoughtWorksInc/twde-datalab/blob/eed2740a1f0753f2066e0fead419fc4165d7b2e7/src/prophet_time_series.py#L22
Currently, we use a hand-rolled function to split the data into train and validation sets. This was initially necessary to match the specific day-of-the-week pattern of the test period used...
Save a predictions.csv file with all the columns, so that we can debug weird predictions and even do exploratory analysis on our predictions. also save the actual value, and some...
The generation of the 'days_off' feature from holiday and weekend informations seems to take longer than expected. Probably the implementation is inefficient. https://github.com/ThoughtWorksInc/twde-datalab/blob/4665597c4a67ac72ec8a40ddcee9890f6be5ade3/src/merger.py#L44
Currently holidays are treated like weekends, so it is only about (some) people not having to work. However, there are obviously effects of specific holidays, such as christmas that could...