Nikolaos Perrakis

Results 9 comments of Nikolaos Perrakis

I m hesitant because I m not sure about the use case. Because supporting that further into the pipeline, meaning creating results and then creating a plot out of them...

Issue no longer applies as of version 0.5.3 Updated code used to check: ``` import pandas as pd import nannyml as nml reference, analysis, _ = nml.datasets.load_synthetic_binary_classification_dataset() # initialize, specify...

Nice dig down! On the latest version, 0.5.3, with this code sample: ``` import pandas as pd import nannyml as nml from IPython.display import display # Load synthetic data reference,...

Hello @lorenzofamiglini Thank you very much for sharing your thoughts on this. I did a quick skim of your posts and they look very promising. We will study them in...

An update regarding your suggestions @lorenzofamiglini I can see now using [np.histogram_bin_edges](https://numpy.org/doc/stable/reference/generated/numpy.histogram_bin_edges.html) will be an improvement over [`_get_bin_index_edges`](https://github.com/NannyML/nannyml/blob/72b1aad79c08ad5942280b35f7ba6398587e566d/nannyml/calibration.py#L152-L187) and will bring it up to the team for consideration and prioritization....

Issue still reproducible under the latest version ``` import wget from pathlib import Path import pandas as pd import nannyml as nml url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/00618/Steel_industry_data.csv' download_foler = Path.home().joinpath("Downloads") filename =...

Also fix API reference code https://nannyml.readthedocs.io/en/stable/nannyml/nannyml.data_quality.missing.calculator.html should be same as tutorial (and has errors)

Small Comment: You may want to consider [numpy.histogram_bin_edges](https://numpy.org/doc/stable/reference/generated/numpy.histogram_bin_edges.html) instead of manually implementing Freedman-Diaconis Rule. It has an option to use FD specifically as well, but maybe use `doane` like in...

Hello Duncan, Thank you for taking the time to report this issue. We have made this treatment because, in our testing, some numerical features with low numbers of unique values...