book icon indicating copy to clipboard operation
book copied to clipboard

book: performance chapter

Open killercup opened this issue 7 years ago • 6 comments

Inspired by this comment and rust-cli/team#29 I've been thinking about adding an in-depth chapter for performance considerations.

The structure would be something like this:

  • Common issues and solutions around fast I/O
  • how to profile CLI apps (not a full guide but good pointers)

killercup avatar Dec 30 '18 15:12 killercup

Common issues and solutions around fast I/O

A topic that comes to mind is non-blocking reads/writes to stdin/stdout. I believe there might be crates for this in the Tokio family, but I have no idea of their quality / performance profiles. Perhaps this is a topic worth exploring? (not to only focus on this, but it's something that I've been wondering for a while now; perhaps it might be worth a mention).


how to profile CLI apps (not a full guide but good pointers)

Are you thinking tools such as perf(1)?

yoshuawuyts avatar Dec 31 '18 12:12 yoshuawuyts

A topic that comes to mind is non-blocking reads/writes to stdin/stdout.

Oh, very good point! We should look into this.

I don't know what the tokio ecosystem has in store for stdout, but one of the design goals of convey is to be super easy to use in multi-threaded code. This includes making all writes async by performing them on a separate thread. (I haven't benchmark this, however, as the current implementation is in a "it works, refactoring coming soon" stage).

I think it might be interesting to see if we can write a quick benchmark comparing code with println in the same thread against code that instead sends a message to a thread that prints.


Are you thinking tools such as perf(1)?

Yep! I've been meaning to look into ways to profile code easily, and cross-platform. E.g., I have saved links to this, this, and some tutorials on using Instruments.app and dtrace, but I've not found a tutorial that explains in 5min how to find the slow parts of a program (which may not even be possible, but I'd like to try at least).

killercup avatar Dec 31 '18 12:12 killercup

but I've not found a tutorial that explains in 5min how to find the slow parts of a program (which may not even be possible, but I'd like to try at least).

I've got a flame(1) script just for this. It runs until the script it's running exits, then opens a flamegraph in your browser. Linux only tho.

usage

$ flame cargo bench # to profile `cargo bench`

flame.sh

#!/bin/bash
set -x
perf record -F 99 -g "$@"
perf script > /tmp/out.perf
stackcollapse-perf /tmp/out.perf > /tmp/out.folded

outfile="/tmp/$(date +%F-%T)-flamegraph.svg"
flamegraph /tmp/out.folded > "$outfile"
rm perf.data /tmp/out.perf /tmp/out.folded

xdg-open "$outfile"

This requires perf and perf-tools to be installed.

yoshuawuyts avatar Dec 31 '18 12:12 yoshuawuyts

We might also want to mention https://github.com/sharkdp/hyperfine and https://github.com/ferrous-systems/flamegraph

killercup avatar Apr 09 '19 08:04 killercup

A good solution I've found is if your program is going to call print!/println! a lot of times, replacing those calls with write! and writing into a std::io::BufWriter bound to io::stdout will reduce printing to the screen to a single syscall making it a lot faster in most cases.

XAMPPRocky avatar Apr 09 '19 09:04 XAMPPRocky

Very true! That's also the only performance hint the book currently contains ;)

killercup avatar Apr 09 '19 09:04 killercup