twde-datalab icon indicating copy to clipboard operation
twde-datalab copied to clipboard

Onboarding to data science by ThoughtWorks

Results 18 twde-datalab issues
Sort by recently updated
recently updated
newest added

Bumps [bleach](https://github.com/mozilla/bleach) from 2.1.4 to 3.1.1. Changelog *Sourced from [bleach's changelog](https://github.com/mozilla/bleach/blob/master/CHANGES).* > Version 3.1.1 (February 13th, 2020) > ----------------------------------- > > **Security fixes** > > * ``bleach.clean`` behavior parsing ``noscript``...

dependencies

Bumps [urllib3](https://github.com/urllib3/urllib3) from 1.23 to 1.24.2. Changelog *Sourced from [urllib3's changelog](https://github.com/urllib3/urllib3/blob/master/CHANGES.rst).* > 1.24.2 (2019-04-17) > ------------------- > > * Don't load system certificates by default when any other ``ca_certs``, ``ca_certs_dir``...

dependencies

I suspect using a [random forest](http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html) would be more computationally expensive but improve our predictive power. As I understand it, computational complexity would be the only major cost to swapping...

machine learning algorithm

Maybe it'd be useful to keep track of blog posts and websites that we find useful. I'm making it an issue now, in case this idea flops, but we could...

When handling files of a size larger than a 5GB a couple of issues appear: Creating such a file in the local file system (e.g. when using `boto.download_file()` / or...

bug
infrastructure

Good candidates are https://github.com/ThoughtWorksInc/twde-datalab/blob/eed2740a1f0753f2066e0fead419fc4165d7b2e7/src/decision_tree.py#L30 or https://github.com/ThoughtWorksInc/twde-datalab/blob/eed2740a1f0753f2066e0fead419fc4165d7b2e7/src/prophet_time_series.py#L22

enhancement

Currently, we use a hand-rolled function to split the data into train and validation sets. This was initially necessary to match the specific day-of-the-week pattern of the test period used...

enhancement

Save a predictions.csv file with all the columns, so that we can debug weird predictions and even do exploratory analysis on our predictions. also save the actual value, and some...

enhancement

The generation of the 'days_off' feature from holiday and weekend informations seems to take longer than expected. Probably the implementation is inefficient. https://github.com/ThoughtWorksInc/twde-datalab/blob/4665597c4a67ac72ec8a40ddcee9890f6be5ade3/src/merger.py#L44

enhancement

Currently holidays are treated like weekends, so it is only about (some) people not having to work. However, there are obviously effects of specific holidays, such as christmas that could...

feature engineering