Nikolaos Perrakis comments

Results 9 comments of


                                            Nikolaos Perrakis

Add bootstrapping options to chunk methods

I m hesitant because I m not sure about the use case. Because supporting that further into the pipeline, meaning creating results and then creating a plot out of them...

Error with the code when ran with chunk number = 9

Issue no longer applies as of version 0.5.3 Updated code used to check: ``` import pandas as pd import nannyml as nml reference, analysis, _ = nml.datasets.load_synthetic_binary_classification_dataset() # initialize, specify...

Pandas data type 'string' not understood

Nice dig down! On the latest version, 0.5.3, with this code sample: ``` import pandas as pd import nannyml as nml from IPython.display import display # Load synthetic data reference,...

Automatic Binning estimation for ECE and Brier Score Metric

Hello @lorenzofamiglini Thank you very much for sharing your thoughts on this. I did a quick skim of your posts and they look very promising. We will study them in...

Automatic Binning estimation for ECE and Brier Score Metric

An update regarding your suggestions @lorenzofamiglini I can see now using [np.histogram_bin_edges](https://numpy.org/doc/stable/reference/generated/numpy.histogram_bin_edges.html) will be an improvement over [`_get_bin_index_edges`](https://github.com/NannyML/nannyml/blob/72b1aad79c08ad5942280b35f7ba6398587e566d/nannyml/calibration.py#L152-L187) and will bring it up to the team for consideration and prioritization....

nannyml can confuse months with days on some rows of a dataset

Issue still reproducible under the latest version ``` import wget from pathlib import Path import pandas as pd import nannyml as nml url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/00618/Steel_industry_data.csv' download_foler = Path.home().joinpath("Downloads") filename =...

Confidence Intervals for Missing Values shouldn't go below 0

Also fix API reference code https://nannyml.readthedocs.io/en/stable/nannyml/nannyml.data_quality.missing.calculator.html should be same as tutorial (and has errors)

Adding PSI for continious data

Small Comment: You may want to consider [numpy.histogram_bin_edges](https://numpy.org/doc/stable/reference/generated/numpy.histogram_bin_edges.html) instead of manually implementing Freedman-Diaconis Rule. It has an option to use FD specifically as well, but maybe use `doane` like in...

change assumed `treat_as_categorical`

Hello Duncan, Thank you for taking the time to report this issue. We have made this treatment because, in our testing, some numerical features with low numbers of unique values...