Use `cuts` instead of `split_on_char`
This PR is a possible improvement about what we expect when we use the basic function split_on_char. From my point of view, String.split_on_char is problematic because:
- the function can split only on a character (it's better to write
cuts ~sep:Filename.dir_sepinstead ofsplit_on_char ~sep:Filename.dir_sep.[0]- especially when this value depends on the system) - the function by default keeps empty parts. Sometimes, it's needed (especially for
Paths.S.parse_filename) but sometimes it's not needed (others usages) and such behavior can lead to undefined behavior
For these two reasons, Support.cuts (took from the great astring):
- take a
stringas a separator - explicit when we want to keep empty parts or not (and be sure about what we manipulate by this way)
The documentation explicits invariants too.
I am surprised that astring is using the O( match_len * text_length) algorithm. Do you know why?
I am surprised that astring is using the O( match_len * text_length) algorithm. Do you know why?
I don't know details about that (/cc @dbuenzli) and, from my point of view, the algorithm seems naive but good enough for the purpose.
I don't think there's any particular reason. On most uses of these kind of functions it seems good enough.
That being said since I have been brought here I will give my two unsolicited €cents :–)
I'm not sure I see the point of this PR, it seems String.split_on_char does the job in every case, use that it already belongs to the stdlib. Less code and no complexity objections. I suspect I won't see in my life-time a platform which has multibyte path separators. If you worry about these empties then most of the applications here go over the list right after so you can eliminate those in that pass and/or simply provide a support functions that filters the result of String.split_on_char.
I'm not sure I see the point of this PR, it seems String.split_on_char does the job in every case, use that it already belongs to the stdlib. Less code and no complexity objections. I suspect I won't see in my life-time a platform which has multibyte path separators. If you worry about these empties then most of the applications here go over the list right after so you can eliminate those in that pass and/or simply provide a support functions that filters the result of String.split_on_char.
Yes, to clarify the goal of this PR, it's just matter of readability which is particulary subjective depending on the reader. As you said, split_on_char does job but I let @Octachron to judge if it's better or not.
I have pushed a commit 36bbd99fc88b which switch to the more complex version if the path separator is a full string (with room for optimisation if it is ever needed). I am not fond of the empty parameter that seems better handled by a separate function. I am thus closing this PR as completed, thanks for the suggestion!