iibarant
iibarant
Here's the code: from string_grouper import match_strings matches = match_strings(check2['full address']) I run the code on MacBook Pro 2.4 GHz 8-Core Intel Core i9 32 GB 2667 MHz DDR4 The...
There you go ... matches = match_strings(check2['full address']) Traceback (most recent call last): File "", line 1, in matches = match_strings(check2['full address']) File "/opt/anaconda3/lib/python3.8/site-packages/string_grouper/string_grouper.py", line 131, in match_strings string_grouper =...
Yes, that works. Thank you. Should I check whether the code works with greater max_n_matches ? I'm planning to keep similarity score > 0.9. Would it be possible to apply...
Hi @berndnoll, I also use this approach and found it very good and quick compared to recordlinkage that you mentioned above. Since you want to use exact match for some...
Hi guys, Unfortunately I am not allowed to share the data, but here's the experiment I ran with 7 states and the code I used. I merged the address fields...
Thank you @ParticularMiner, This is a good point. That's why I decided to combine states with much lower data collection (in my case < 10000) all together. Hope this would...
@ParticularMiner, That's a great experiment. I ran your exact code on data from @berndnoll and received the following run time figures: T1 = 3 min 8.07 sec T2 = 2...
Thank you! The Labor Day was great. I tried the following code on my 3.5 million data frame: ```python record_matches = record_linkage(df, fields_2b_matched_fuzzily=[('full address', 0.8, 3)], fields_2b_matched_exactly=['foad'], hierarchical=True, max_n_matches=10000, similarity_dtype=np.float32)...
@ParticularMiner Reducing the max_n_matches to 5000 allowed to run the whole data for 1 hour 53 minutes which is not bad for the size of the dataframe (over3.5 million records).
@okkyadhi, I had the same error. Try to update the numpy library. It worked in my case.