Function match_affil fails when distance > 0.1
When distance > 0.1 the following line is executed: dist = affiliation_check(grid_data[i], affil)
However, grid_data is not defined. In fact, it is not mentioned again on the whole file. I did a quick check by replacing it with grid_dict, in case it was a typo, but it didn't work either.
NameError Traceback (most recent call last)
<ipython-input-6-0716e06a004c> in <module>()
2 for i in range(20):
3 aff_string = next(aff_strings)
----> 4 print(match_affil(aff_string))
~/.miniconda/envs/scitodate-3.6/lib/python3.6/site-packages/affiliation_parser-0.1-py3.6.egg/affiliation_parser/matcher.py in match_affil(affil_text)
116 matched = True
117 else:
--> 118 dist = affiliation_check(grid_data[i], affil)
119 matched = True if dist == 0 else False
120
NameError: name 'grid_data' is not defined
@oersted thanks for pointing out. I'll check by this week and fix it!
Nice, thanks for the quick response.
I doesn't look too hard of a fix. Could you tell me what that variable is supposed to be? Maybe I can get it done. I am interested in having a functional version as soon as possible.
@oersted, can you check if it's working now? It seems like I have wrong variable name in matcher.py. This is a very crude nearest neighbor tho. I will hopefully update the model later on!
Hi @titipata, sorry for the late response. No this doesn't seem to have fixed the issue. I tried this fix myself. This is the new error:
AttributeError Traceback (most recent call last)
<ipython-input-4-c0d52311dc4b> in <module>()
1 for row in query.results():
2 aff_str = row[0]
----> 3 print(match_affil(aff_str))
~/.miniconda/envs/scitodate-3.6/lib/python3.6/site-packages/affiliation_parser-0.1-py3.6.egg/affiliation_parser/matcher.py in match_affil(affil_text)
116 matched = True
117 else:
--> 118 dist = affiliation_check(grid_dict[i], affil)
119 matched = True if dist == 0 else False
120
~/.miniconda/envs/scitodate-3.6/lib/python3.6/site-packages/affiliation_parser-0.1-py3.6.egg/affiliation_parser/matcher.py in affiliation_check(affil_1, affil_2)
86 if return 0 means both string are the same
87 """
---> 88 if country_distance(affil_1, affil_2) == 0 and len(affil_2.name.split(" ")) >= 2:
89 if unigram_distance(affil_1, affil_2) <= 1:
90 dist = 0
~/.miniconda/envs/scitodate-3.6/lib/python3.6/site-packages/affiliation_parser-0.1-py3.6.egg/affiliation_parser/matcher.py in country_distance(affil_1, affil_2)
62 def country_distance(affil_1, affil_2):
63 """"""
---> 64 if affil_1.country == affil_2.country or affil_1.country == '' or affil_2.country == '':
65 return 0
66 else:
AttributeError: 'collections.OrderedDict' object has no attribute 'country'
Ah, yes, I see the problem, it may be a versioning issue. I am using Python 3.6.3. It is also possible that the GRID headers have changed, because the columns start with uppercase now.
It seems like the way that you index the records doesn't work any more. You should do affil['Country'] rather than affil.country, and the same with all the rest of equivalent cases.
I fixed all the obvious issues I could find in PR #9