Python
Python copied to clipboard
Add xxHash32 algorithm implementation
Description
Implements the xxHash32 algorithm, an extremely fast non-cryptographic hash function designed to operate at RAM speed limits. xxHash is widely used in high-performance applications.
Changes
- Added hashes/xxhash.py with complete implementation
- Uses 4 parallel accumulators for maximum speed
- Processes data in optimized 16-byte stripes
- Carefully selected prime constants (0x9E3779B1, etc.)
- Optional seed parameter for hash variation
- 15 comprehensive doctests including:
- Empty and small inputs
- Large inputs
- Edge cases (null bytes)
- Seed functionality
Features
- Fast: One of the fastest hash algorithms
- Excellent distribution: Passes SMHasher test suite
- Production-ready: Used in Zstandard, LZ4, rsync
- Seed support: Optional seed for different hash families
Real-world Applications
- Zstandard compression (Facebook's compression algorithm)
- LZ4 compression
- rsync file synchronization
- Database indexing and deduplication
- Hash tables requiring high performance
References
- https://github.com/Cyan4973/xxHash
- https://xxhash.com/
- https://github.com/Cyan4973/xxHash/blob/dev/doc/xxhash_spec.md
Testing
- All 15 doctests pass
- Passes ruff linting
- Passes mypy type checking
- Tested with various inputs and seeds
Checklist
- [x] I have read and followed the contributing guidelines
- [x] This pull request is not a duplicate
- [x] The algorithm is implemented from scratch
- [x] The code follows the repository style
- [x] I have read CONTRIBUTING.md
- [x] This pull request is all my own work -- I have not plagiarized
- [x] I know that pull requests will not be merged if they fail the automated tests
- [x] This PR only changes one algorithm file
- [x] All new Python files are placed inside an existing directory (hashes/)
- [x] All filenames are in all lowercase characters with no spaces or dashes
- [x] All functions and variable names follow Python naming conventions
- [x] All function parameters and return values are annotated with Python type hints
- [x] All functions have doctests that pass the automated testing
- [x] All new algorithms include at least one URL that points to Wikipedia or another similar explanation