Python
Python copied to clipboard
String Tokenization with Byte-Pair Encoding
Describe your change:
- [x] Add an algorithm?
- [ ] Fix a bug or typo in an existing algorithm?
- [ ] Add or change doctests? -- Note: Please avoid changing both code and tests in a single pull request.
- [ ] Documentation change?
Checklist:
- [x] I have read CONTRIBUTING.md.
- [x] This pull request is all my own work -- I have not plagiarized.
- [x] I know that pull requests will not be merged if they fail the automated tests.
- [x] This PR only changes one algorithm file. To ease review, please open separate PRs for separate algorithms.
- [x] All new Python files are placed inside an existing directory.
- [x] All filenames are in all lowercase characters with no spaces or dashes.
- [x] All functions and variable names follow Python naming conventions.
- [x] All function parameters and return values are annotated with Python type hints.
- [x] All functions have doctests that pass the automated testing.
- [x] All new algorithms include at least one URL that points to Wikipedia or another similar explanation.
- [x] If this pull request resolves one or more open issues then the description above includes the issue number(s) with a closing keyword: "Fixes #ISSUE-NUMBER".
Hi, this has been 'open' for quite some time. Can someone help with the review and merge?
@algorithms-keeper review
Hi maintainers,
I submitted this PR a while ago and wanted to know if there's any update on the review process. I'm happy to make any necessary changes if needed. Thanks for your time!
@MaximSmolskiy @cclauss