valentine
valentine copied to clipboard
A tool facilitating matching for any dataset discovery method. Also, an extensible experiment suite for state-of-the-art schema matching methods.
# Description: This PR introduces a new function, add_noise_to_df_column, designed to add noise to a specified column in a DataFrame. The function addresses issue #63, where there was a request...
Resolves #52 As stated in issue #52 , it would be useful to be able to get the top n similar columns when analyzing the data. Since the issue is...
Add methods that utilize column vector representations and cosine similarity among them to determine matches.
I would like to be able to load a dataframe, and then add noise to specific columns of that dataset.
Hi Valentine authors! I am having trouble with a bug that seems to be coming from Valentine, but I am unsure: - in `similarity_flooding.py`, is it expected that `long_name` may...
It would be incredibly useful to give for each column in df1, give top n column matches in df2.
The upgrade to nltk to version 3.9.1 is a BREAKING change. This change downloads `punkt_tab` instead of `punkt` which has a critical security vulnerability (CVE-2024-39705). See e.g.: - https://github.com/advisories/GHSA-cgvx-9447-vcch -...
Bumps [nltk](https://github.com/nltk/nltk) from 3.8.1 to 3.9. Changelog Sourced from nltk's changelog. Version 3.9.1 2024-08-19 Fixed bug that prevented wordnet from loading Version 3.9 2024-08-18 Avoid need for pickled models, resolves...
This PR introduces a standardized .githooks/ directory and sets Git’s core.hooksPath so that all contributors automatically run the test suite before committing or pushing code. This ensures higher code quality,...