pythonflow icon indicating copy to clipboard operation
pythonflow copied to clipboard

Nondeterministic string hashing in Python(>3.3)

Open Cjen1 opened this issue 2 years ago • 1 comments

I was running into some weird issues with incorrect caching to file a function applied to a string.

This is because python(>3.3) salts its hashing function. (for strings at least) Specifically:

> python -c "print(hash('asdf'))"
-8690208562067163084
> python -c "print(hash('asdf'))"
-4220296486527231708

The fix for this is to pass in PYTHONHASHSEED=1. The 'proper' fix would be to substitute the internal hash function for something more suitable, however I couldn't immediately see the right place to inject that.

PYTHONHASHSEED=1 python -c "print(hash('asdf'))" -5132432945605986887 PYTHONHASHSEED=1 python -c "print(hash('asdf'))" -5132432945605986887

Cjen1 avatar Mar 06 '23 14:03 Cjen1

Thanks for reporting. This should only be a problem if results are cached across different processes. Do you have a reproducible code snippet to illustrate the issue?

tillahoffmann avatar Mar 06 '23 18:03 tillahoffmann