Jovan Stojanovic

Results 18 issues of Jovan Stojanovic

This PR adds the new `FuzzyJoin` class that allows joining tables with dirty columns. Co-authored-by: @LeoGrin

In the light of what we talked on improving user experience: We can add example 3 as being the last part of example 2, rather than a separate example. Looking...

Documentation

Solves #53, for release 0.3. Adding an optional K-fold encoding to the TargetEncoder that will avoid overfitting and bring better results when working on larger datasets. Adding an example of...

Hi, I noticed there were no benchmarks that compare the different dirty_cat encoders. Inspired from [scikit-learn](https://github.com/scikit-learn/scikit-learn/blob/main/benchmarks/bench_text_vectorizers.py), this is a simple benchmark comparing time and memory usage of the Similarity, MinHash...

Our main encoders and other features are missing simple reproducible examples that will appear on the website and give a first overview of what the feature can do. ### Examples...

Documentation
meta-issue

Here are some of the code coverage results: ``` ------------------------------------------------------------------------ dirty_cat/_datetime_encoder.py 88% dirty_cat/_gap_encoder.py 88% dirty_cat/_minhash_encoder.py 94% dirty_cat/_similarity_encoder.py 90% dirty_cat/_string_distances.py 91% dirty_cat/_super_vectorizer.py 85% dirty_cat/_target_encoder.py 78% dirty_cat/_utils.py 93% dirty_cat/datasets/_fetching.py 97% ... ------------------------------------------------------------------------...

enhancement
meta-issue

As noted in the title and here: https://github.com/dirty-cat/dirty_cat/pull/368#issuecomment-1254731292

Documentation

In [example 1](https://dirty-cat.github.io/stable/auto_examples/01_dirty_categories.html#sphx-glr-auto-examples-01-dirty-categories-py), we use the `year first hired` feature to predict wages. In the second part, when we use dirty_cat to encode variables, we may use the `DatetimeEncoder` instead...

Documentation

Some improvements on the examples may be: - check that functions and classes are marked with the correct syntax (:class:`class_name`, :func:`function_name`) - check that all links are working and all...

Documentation