http Audit the use of unsafe in uri/path.rs

Reorganize the one function with a use of unsafe (from_shared()) to highlight that it does a scan of each byte that it eventually passes to ByteStr::from_utf8_unchecked(). This makes it apparent (as described in the "Safety" comment) that the input to from_utf8_unchecked() is valid UTF-8 and is, thus, sound.

This is a part of #412.

Apr 12 '20 00:04 sbosnick

If there are any concerns about a possible performance impact I would be happy to add benchmarks to answer that question.

Apr 17 '20 00:04 sbosnick

The latest commit removes the performance regression that the benchmarks in benches/uri.rs showed (to within the margin of error). This retains the one-loop version of the core parsing code but changed from internal iteration (using try_fold()) to external iteration.

It appears that the outstanding issue then is whether this code should use a two-loop version that emphasizes parsing for two separate things (the path and the query) of a one-loop version that emphasized checking each byte in turn that will eventually be passed to ByteStr::from_utf8_unchecked().

I favour the latter because it makes the soundness easier to see. I can convert this back to a two-loop version, though, if there is a reason to prefer that approach.

May 15 '20 12:05 sbosnick

Thanks! I tried out the benchmarks with the newest changes (commit d4718c9 ), here's what I see:

Master:

test uri_parse_relative_medium ... bench:         110 ns/iter (+/- 0) = 545 MB/s
test uri_parse_relative_query  ... bench:         134 ns/iter (+/- 1) = 626 MB/s
test uri_parse_slash           ... bench:          40 ns/iter (+/- 1) = 25 MB/s

d4718c9:

test uri_parse_relative_medium ... bench:         132 ns/iter (+/- 1) = 454 MB/s
test uri_parse_relative_query  ... bench:         169 ns/iter (+/- 1) = 497 MB/s
test uri_parse_slash           ... bench:          41 ns/iter (+/- 0) = 24 MB/s

May 15 '20 19:05 seanmonstar

The latest commit improves the performance of this branch to just shy of master. The results that I see on the uri benchmarks are as follows:

Master:

test uri_parse_relative_medium ... bench:         120 ns/iter (+/- 0) = 500 MB/s
test uri_parse_relative_query  ... bench:         170 ns/iter (+/- 0) = 494 MB/s
test uri_parse_slash           ... bench:          30 ns/iter (+/- 0) = 33 MB/s

https://github.com/hyperium/http/pull/413/commits/27ff0cb0bf1e2b41755c24e24d188cda63b4dc76:

test uri_parse_relative_medium ... bench:         126 ns/iter (+/- 1) = 476 MB/s
test uri_parse_relative_query  ... bench:         176 ns/iter (+/- 0) = 477 MB/s
test uri_parse_slash           ... bench:          30 ns/iter (+/- 0) = 33 MB/s

So the two benchmarks that are affected by these changes are 6 ns/iter slower as compared with master.

This change returns to a two-loop version (that optimizes better than the different variations I tried of the one-loop version) but it uses take_while() to end checking the bytes at the fragment specifier instead of having an early exit in each of the two loops. This makes it more obvious that between the two loops they check every byte up to the point where the later code truncates src (this is embedded in the code itself instead of having to be described in comments).

I don't know how crucial the uri parsing code is on the fast path of the crates that rely on http. I suggest, though, that if 6 ns/iter is an important difference then we need more (and better) benchmarks of the uri parsing (which I am prepared to write).

May 18 '20 17:05 sbosnick