affiliation_parser icon indicating copy to clipboard operation
affiliation_parser copied to clipboard

Function match_affil fails when distance > 0.1

Open oersted opened this issue 8 years ago • 6 comments

When distance > 0.1 the following line is executed: dist = affiliation_check(grid_data[i], affil)

However, grid_data is not defined. In fact, it is not mentioned again on the whole file. I did a quick check by replacing it with grid_dict, in case it was a typo, but it didn't work either.

NameError                                 Traceback (most recent call last)
<ipython-input-6-0716e06a004c> in <module>()
      2 for i in range(20):
      3     aff_string = next(aff_strings)
----> 4     print(match_affil(aff_string))

~/.miniconda/envs/scitodate-3.6/lib/python3.6/site-packages/affiliation_parser-0.1-py3.6.egg/affiliation_parser/matcher.py in match_affil(affil_text)
    116         matched = True
    117     else:
--> 118         dist = affiliation_check(grid_data[i], affil)
    119         matched = True if dist == 0 else False
    120 

NameError: name 'grid_data' is not defined

oersted avatar Jan 22 '18 16:01 oersted

@oersted thanks for pointing out. I'll check by this week and fix it!

titipata avatar Jan 22 '18 17:01 titipata

Nice, thanks for the quick response.

I doesn't look too hard of a fix. Could you tell me what that variable is supposed to be? Maybe I can get it done. I am interested in having a functional version as soon as possible.

oersted avatar Jan 22 '18 17:01 oersted

@oersted, can you check if it's working now? It seems like I have wrong variable name in matcher.py. This is a very crude nearest neighbor tho. I will hopefully update the model later on!

titipata avatar Jan 24 '18 22:01 titipata

Hi @titipata, sorry for the late response. No this doesn't seem to have fixed the issue. I tried this fix myself. This is the new error:

AttributeError                            Traceback (most recent call last)
<ipython-input-4-c0d52311dc4b> in <module>()
      1 for row in query.results():
      2     aff_str = row[0]
----> 3     print(match_affil(aff_str))

~/.miniconda/envs/scitodate-3.6/lib/python3.6/site-packages/affiliation_parser-0.1-py3.6.egg/affiliation_parser/matcher.py in match_affil(affil_text)
    116         matched = True
    117     else:
--> 118         dist = affiliation_check(grid_dict[i], affil)
    119         matched = True if dist == 0 else False
    120 

~/.miniconda/envs/scitodate-3.6/lib/python3.6/site-packages/affiliation_parser-0.1-py3.6.egg/affiliation_parser/matcher.py in affiliation_check(affil_1, affil_2)
     86     if return 0 means both string are the same
     87     """
---> 88     if country_distance(affil_1, affil_2) == 0 and len(affil_2.name.split(" ")) >= 2:
     89         if unigram_distance(affil_1, affil_2) <= 1:
     90             dist = 0

~/.miniconda/envs/scitodate-3.6/lib/python3.6/site-packages/affiliation_parser-0.1-py3.6.egg/affiliation_parser/matcher.py in country_distance(affil_1, affil_2)
     62 def country_distance(affil_1, affil_2):
     63     """"""
---> 64     if affil_1.country == affil_2.country or affil_1.country == '' or affil_2.country == '':
     65         return 0
     66     else:

AttributeError: 'collections.OrderedDict' object has no attribute 'country'

oersted avatar Feb 07 '18 14:02 oersted

Ah, yes, I see the problem, it may be a versioning issue. I am using Python 3.6.3. It is also possible that the GRID headers have changed, because the columns start with uppercase now.

It seems like the way that you index the records doesn't work any more. You should do affil['Country'] rather than affil.country, and the same with all the rest of equivalent cases.

oersted avatar Feb 07 '18 14:02 oersted

I fixed all the obvious issues I could find in PR #9

oersted avatar Feb 07 '18 15:02 oersted