Chart2Text icon indicating copy to clipboard operation
Chart2Text copied to clipboard

templatePreprocess.py subject does not exist issue

Open salmanedhi opened this issue 4 years ago • 3 comments

I have found an issue on line 438. The problem is that if the title does not contain any word in uppercase then the code crashes. Can you please suggest a fix for this.

We get an index out of bounds error in line 'guessedSubject = uppercaseWords[0]' since the uppercaseWords list is empty

image

salmanedhi avatar Aug 25 '21 20:08 salmanedhi

Hi,

You can try updating the code as follows, to avoid the index out of bounds errors:

# guess subject if NER doesn't find one
if len(entities['Subject']) == 0:
    uppercaseWords = [word for word in titleTokens if word[0].isupper()]
    if len(uppercaseWords) > 1:
        guessedSubject = ' '.join(uppercaseWords[1:])
        entities['Subject'].append(guessedSubject)
    elif len(uppercaseWords) == 1:
        guessedSubject = uppercaseWords[0]`
        entities['Subject'].append(guessedSubject)

The script should work even if the list at entities['Subject'] is empty. Let me know if it works!

JasonObeid avatar Aug 25 '21 22:08 JasonObeid

Hi, thank you for your prompt response. This worked. I have one more suggestion. For file etc/extract_vocab.py

Please add 'encoding="utf8"' when reading and writing files as the code crashes otherwise.

salmanedhi avatar Aug 26 '21 08:08 salmanedhi

Hi @salmanedhi, feel free to submit a new pull request with your updated version of this file, and I can merge it into the main branch. Thanks :)

JasonObeid avatar Sep 07 '21 11:09 JasonObeid