valentine icon indicating copy to clipboard operation
valentine copied to clipboard

A tool facilitating matching for any dataset discovery method. Also, an extensible experiment suite for state-of-the-art schema matching methods.

Results 11 valentine issues
Sort by recently updated
recently updated
newest added

# Description: This PR introduces a new function, add_noise_to_df_column, designed to add noise to a specified column in a DataFrame. The function addresses issue #63, where there was a request...

enhancement

Resolves #52 As stated in issue #52 , it would be useful to be able to get the top n similar columns when analyzing the data. Since the issue is...

Add methods that utilize column vector representations and cosine similarity among them to determine matches.

enhancement
nice to have

I would like to be able to load a dataframe, and then add noise to specific columns of that dataset.

Hi Valentine authors! I am having trouble with a bug that seems to be coming from Valentine, but I am unsure: - in `similarity_flooding.py`, is it expected that `long_name` may...

It would be incredibly useful to give for each column in df1, give top n column matches in df2.

nice to have

The upgrade to nltk to version 3.9.1 is a BREAKING change. This change downloads `punkt_tab` instead of `punkt` which has a critical security vulnerability (CVE-2024-39705). See e.g.: - https://github.com/advisories/GHSA-cgvx-9447-vcch -...

Bumps [nltk](https://github.com/nltk/nltk) from 3.8.1 to 3.9. Changelog Sourced from nltk's changelog. Version 3.9.1 2024-08-19 Fixed bug that prevented wordnet from loading Version 3.9 2024-08-18 Avoid need for pickled models, resolves...

dependencies

This PR introduces a standardized .githooks/ directory and sets Git’s core.hooksPath so that all contributors automatically run the test suite before committing or pushing code. This ensures higher code quality,...