Python icon indicating copy to clipboard operation
Python copied to clipboard

Add xxHash32 algorithm implementation

Open rodrigobnogueira opened this issue 3 weeks ago • 0 comments

Description

Implements the xxHash32 algorithm, an extremely fast non-cryptographic hash function designed to operate at RAM speed limits. xxHash is widely used in high-performance applications.

Changes

  • Added hashes/xxhash.py with complete implementation
  • Uses 4 parallel accumulators for maximum speed
  • Processes data in optimized 16-byte stripes
  • Carefully selected prime constants (0x9E3779B1, etc.)
  • Optional seed parameter for hash variation
  • 15 comprehensive doctests including:
    • Empty and small inputs
    • Large inputs
    • Edge cases (null bytes)
    • Seed functionality

Features

  • Fast: One of the fastest hash algorithms
  • Excellent distribution: Passes SMHasher test suite
  • Production-ready: Used in Zstandard, LZ4, rsync
  • Seed support: Optional seed for different hash families

Real-world Applications

  • Zstandard compression (Facebook's compression algorithm)
  • LZ4 compression
  • rsync file synchronization
  • Database indexing and deduplication
  • Hash tables requiring high performance

References

  • https://github.com/Cyan4973/xxHash
  • https://xxhash.com/
  • https://github.com/Cyan4973/xxHash/blob/dev/doc/xxhash_spec.md

Testing

  • All 15 doctests pass
  • Passes ruff linting
  • Passes mypy type checking
  • Tested with various inputs and seeds

Checklist

  • [x] I have read and followed the contributing guidelines
  • [x] This pull request is not a duplicate
  • [x] The algorithm is implemented from scratch
  • [x] The code follows the repository style
  • [x] I have read CONTRIBUTING.md
  • [x] This pull request is all my own work -- I have not plagiarized
  • [x] I know that pull requests will not be merged if they fail the automated tests
  • [x] This PR only changes one algorithm file
  • [x] All new Python files are placed inside an existing directory (hashes/)
  • [x] All filenames are in all lowercase characters with no spaces or dashes
  • [x] All functions and variable names follow Python naming conventions
  • [x] All function parameters and return values are annotated with Python type hints
  • [x] All functions have doctests that pass the automated testing
  • [x] All new algorithms include at least one URL that points to Wikipedia or another similar explanation

rodrigobnogueira avatar Dec 25 '25 18:12 rodrigobnogueira