templatePreprocess.py subject does not exist issue
I have found an issue on line 438. The problem is that if the title does not contain any word in uppercase then the code crashes. Can you please suggest a fix for this.
We get an index out of bounds error in line 'guessedSubject = uppercaseWords[0]' since the uppercaseWords list is empty

Hi,
You can try updating the code as follows, to avoid the index out of bounds errors:
# guess subject if NER doesn't find one
if len(entities['Subject']) == 0:
uppercaseWords = [word for word in titleTokens if word[0].isupper()]
if len(uppercaseWords) > 1:
guessedSubject = ' '.join(uppercaseWords[1:])
entities['Subject'].append(guessedSubject)
elif len(uppercaseWords) == 1:
guessedSubject = uppercaseWords[0]`
entities['Subject'].append(guessedSubject)
The script should work even if the list at entities['Subject'] is empty. Let me know if it works!
Hi, thank you for your prompt response. This worked. I have one more suggestion. For file etc/extract_vocab.py
Please add 'encoding="utf8"' when reading and writing files as the code crashes otherwise.
Hi @salmanedhi, feel free to submit a new pull request with your updated version of this file, and I can merge it into the main branch. Thanks :)