Bernd Noll
Bernd Noll
What about using max as a default instead of min?
I am talking about the metadata set. Why not make "max" the default metadata set instead of "min". Or is there any other way to use the "max" set without...
Hi @ParticularMiner, thank you very much for your helpful answer. I managed to implement the merge operation and got this to work according to your advice. Performance is fine with...
@iibarant, thanks for the advice. That's exactly what I implemented today. However, running multiple matches for small subsets takes actually way more time than running one match over a big...
Hi gents, unfortunately not able to share the original file, but the one attached will do for our purposes. My code below. Grouping/blocking can be switched on/off by setting attribute...
@ParticularMiner: Sorry for that, trying one more time... ```python import sys, csv, json, pandas as pd import numpy import string_grouper as sg import time print('Dedupe script') t0 = time.time() inputfilename...
@ParticularMiner Thank you so much for looking into this and for collaborating on this. Very much appreciated! I basically copy and pasted your code to test it before applying to...
Hi @ParticularMiner, my machine specs are similar to yours...Intel(R) Core(TM) i7-8650U CPU @ 2.11 GHz, 16 GB RAM, SSD, Win 10 Prof. Looking at the runtimes I had with my...