Python icon indicating copy to clipboard operation
Python copied to clipboard

String Tokenization with Byte-Pair Encoding

Open hash-ir opened this issue 1 year ago • 3 comments

Describe your change:

  • [x] Add an algorithm?
  • [ ] Fix a bug or typo in an existing algorithm?
  • [ ] Add or change doctests? -- Note: Please avoid changing both code and tests in a single pull request.
  • [ ] Documentation change?

Checklist:

  • [x] I have read CONTRIBUTING.md.
  • [x] This pull request is all my own work -- I have not plagiarized.
  • [x] I know that pull requests will not be merged if they fail the automated tests.
  • [x] This PR only changes one algorithm file. To ease review, please open separate PRs for separate algorithms.
  • [x] All new Python files are placed inside an existing directory.
  • [x] All filenames are in all lowercase characters with no spaces or dashes.
  • [x] All functions and variable names follow Python naming conventions.
  • [x] All function parameters and return values are annotated with Python type hints.
  • [x] All functions have doctests that pass the automated testing.
  • [x] All new algorithms include at least one URL that points to Wikipedia or another similar explanation.
  • [x] If this pull request resolves one or more open issues then the description above includes the issue number(s) with a closing keyword: "Fixes #ISSUE-NUMBER".

hash-ir avatar Oct 05 '24 17:10 hash-ir

Hi, this has been 'open' for quite some time. Can someone help with the review and merge?

hash-ir avatar Oct 27 '24 21:10 hash-ir

@algorithms-keeper review

hash-ir avatar Oct 27 '24 21:10 hash-ir

Hi maintainers,

I submitted this PR a while ago and wanted to know if there's any update on the review process. I'm happy to make any necessary changes if needed. Thanks for your time!

@MaximSmolskiy @cclauss

hash-ir avatar Feb 18 '25 19:02 hash-ir