tokenizers icon indicating copy to clipboard operation
tokenizers copied to clipboard

Tokenizer training fails with no error message if passed a Path

Open miguelvictor opened this issue 5 years ago • 3 comments

Passing paths to a training file using pathlib library throws a TypeError without any helpful message.

Please see this gist to reproduce this bug.

miguelvictor avatar Sep 17 '20 07:09 miguelvictor

More helpful error messages will be generated once PyO3 is upgraded to 0.12. In this particular case there is no implementation to extract a String in Rust from a pathlib Path. I'm not sure whether there is a simple and sane solution for tokenizers to support Paths, but the straight forward one is to pass a str to the failing function.

sebpuetz avatar Sep 17 '20 07:09 sebpuetz

FYI for PyO3 0.14 you can write functions which take Rust Path objects which will accept both strings and pathlib.Path objects from Python.

davidhewitt avatar Aug 08 '21 22:08 davidhewitt

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions[bot] avatar May 14 '24 01:05 github-actions[bot]