python-stop-words icon indicating copy to clipboard operation
python-stop-words copied to clipboard

modified get_stop_words(), preventing being changed from outside.

Open yyanhan opened this issue 1 year ago • 0 comments

Dear Alir3z4,

I used this repo for the work at my previous company, and I found one issue with the function get_stop_words():

if we obtain the list in variable and modifiy the list variable, like:

en_stop_words = get_stop_words('en')
en_stop_words.append('harrypotter')

then the return list from get_stop_words() will also be changed:

'harrypotter' in get_stop_words('en')   # True

This will raise a mistake when we call the function get_stop_words('en') many times recursively, like:

en_stop_words_again = get_stop_words('en')
'harrypotter' in en_stop_words_again    # True

To solve this issue, of course the user can use copy.deepcopy(get_stop_words('en')), however this may not be noticed by the user.

Thus I added a copy in the function get_stop_words('en'), namely:

replacing:
    return stop_words

by: 
     return stop_words[:]

and as a result:

en_stop_words = get_stop_words('en')
en_stop_words.append('harrypotter')
en_stop_words_again = get_stop_words('en')

'harrypotter' in en_stop_words              # True
'harrypotter' in get_stop_words('en')     # False
'harrypotter' in en_stop_words_again    # False

And I have tested the performance before and after, see:

  • before: https://github.com/yyanhan/python-stop-words/blob/example/test_before.ipynb

  • after: https://github.com/yyanhan/python-stop-words/blob/example/test_after.ipynb

I hope this PR can make it better!

Best Han

yyanhan avatar Mar 12 '24 10:03 yyanhan