head: rework handling of non-seekable files
Rework logic for handling all-but-last-lines and all-but-last-bytes for non-seekable files. Changes give large performance improvement.
GNU testsuite comparison:
GNU test failed: tests/head/head-write-error. tests/head/head-write-error is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/misc/tee. tests/misc/tee is passing on 'main'. Maybe you have to rebase?
GNU testsuite comparison:
GNU test failed: tests/misc/tee. tests/misc/tee is passing on 'main'. Maybe you have to rebase?
Skip an intermittent issue tests/timeout/timeout (fails in this run but passes in the 'main' branch)
Changes give large performance improvement.
Can you attach a benchmark to show the improvement ?
GNU testsuite comparison:
GNU test failed: tests/misc/tee. tests/misc/tee is passing on 'main'. Maybe you have to rebase?
Skip an intermittent issue tests/timeout/timeout (fails in this run but passes in the 'main' branch)
Changes give large performance improvement.
Can you attach a benchmark to show the improvement ?
The performance improvements are seen when the input comes in via a pipe. Comparing current head with head_improved which contains my changes...
hyperfine "cat ./shakespeare.txt | ./target/release/head -c -100000" "cat ./shakespeare.txt | ./target/release/head_improved -c -100000"
Summary
cat ./shakespeare.txt | ./target/release/head_improved -c -100000 ran
17.93 ± 2.56 times faster than cat ./shakespeare.txt | ./target/release/head -c -100000
hyperfine "cat ./shakespeare.txt | ./target/release/head -n -100000" "cat ./shakespeare.txt | ./target/release/head_improved -n -100000"
Summary
cat ./shakespeare.txt | ./target/release/head_improved -n -100000 ran
24.38 ± 2.60 times faster than cat ./shakespeare.txt | ./target/release/head -n -100000
Also should note though that I want to add a bit more test code before anyone reviews these changes. Give me another day or so please :)
GNU testsuite comparison:
Skipping an intermittent issue tests/timeout/timeout (passes in this run but fails in the 'main' branch)
GNU testsuite comparison:
Skipping an intermittent issue tests/timeout/timeout (passes in this run but fails in the 'main' branch)
In addition to the main changes here, I also made a few changes to add/remove buffering (BufReader/BufWriter) through the head.rs file - I've remove some buffering that was redundant and added it where I think it's helpful. Here are the performance numbers before/after my changes on my machine (running Linux) to confirm that nothing has regressed...
| Command | Time Before (ms) | Time After (ms) | Improvement Ratio |
|---|---|---|---|
head -n 100000 ./shakespeare.txt |
3.6 | 3.0 | 1.19 |
head -n -100000 ./shakespeare.txt |
3.9 | 3.9 | 1 |
head -c 100000 ./shakespeare.txt |
1.0 | 1.0 | 1 |
head -c -100000 ./shakespeare.txt |
1.2 | 1.2 | 1 |
cat ./shakespeare.txt | head -n 100000 |
4.0 | 3.5 | 1.15 |
cat ./shakespeare.txt | head -n -100000 |
133.5 | 5.7 | 23 |
cat ./shakespeare.txt | head -c 100000 |
1.4 | 1.4 | 1 |
cat ./shakespeare.txt | head -c -100000 |
75.5 | 4.0 | 19 |
So, no regressions, some reasonable improvements (~15%) for the ... -n 100000 cases, and some really big improvements for the cat ... -100000 cases.
If there are any other usecases people think I should run then please let me know.
GNU testsuite comparison:
Skip an intermittent issue tests/misc/stdbuf (fails in this run but passes in the 'main' branch)
Skipping an intermittent issue tests/timeout/timeout (passes in this run but fails in the 'main' branch)
GNU testsuite comparison:
Skip an intermittent issue tests/timeout/timeout (fails in this run but passes in the 'main' branch)
Skipping an intermittent issue tests/misc/stdbuf (passes in this run but fails in the 'main' branch)
GNU testsuite comparison:
Skip an intermittent issue tests/timeout/timeout (fails in this run but passes in the 'main' branch)
Skipping an intermittent issue tests/misc/stdbuf (passes in this run but fails in the 'main' branch)
Congrats! The gnu test tests/misc/tee is no longer failing!
Appart from the typo. this looks good ^^ It's nice to see such performance improvement 🚀
Thanks for the review!
GNU testsuite comparison:
Skip an intermittent issue tests/timeout/timeout (fails in this run but passes in the 'main' branch)